Configuration
Inference Gateway provides flexible configuration options to adapt to your specific needs. As a proxy server designed to facilitate access to various language model APIs, proper configuration is essential for optimal performance and security.
Configuration Methods
Inference Gateway supports multiple configuration methods to suit different deployment scenarios:
- Environment Variables - Recommended for most deployments
- Kubernetes ConfigMaps and Secrets - For Kubernetes-based deployments
- Configuration Files - For local development and testing
Environment Variables
Environment variables are the primary method for configuring Inference Gateway. These variables control everything from basic server settings to provider-specific API configurations.
General Settings
| Variable | Description | Default |
|---|---|---|
| ENVIRONMENT | Deployment environment | production |
| ENABLE_VISION | Enable vision/multimodal support for all providers | false |
| TELEMETRY_ENABLE | Enable OpenTelemetry metrics and tracing | false |
| AUTH_ENABLE | Enable OIDC authentication | false |
When ENABLE_VISION is set to true, Inference Gateway enables vision/multimodal capabilities, allowing you to send images alongside text in chat completion requests. When disabled (default), requests with image content will be rejected even if the provider and model support vision. This is disabled by default for performance and security reasons.
When TELEMETRY_ENABLE is set to true, Inference Gateway exposes a /metrics endpoint for Prometheus scraping and generates distributed traces that can be collected by OpenTelemetry collectors.
OpenID Connect
If authentication is enabled (AUTH_ENABLE=true), configure the following OIDC settings:
| Variable | Description | Default |
|---|---|---|
| OIDC_ISSUER_URL | OIDC issuer URL | http://keycloak:8080/realms/inference-gateway-realm |
| OIDC_CLIENT_ID | OIDC client ID | inference-gateway-client |
| OIDC_CLIENT_SECRET | OIDC client secret | "" |
When authentication is enabled, all API requests must include a valid JWT token in the Authorization header:
Authorization: Bearer YOUR_JWT_TOKEN
Server Settings
These settings control the core HTTP server behavior:
| Variable | Description | Default |
|---|---|---|
| SERVER_HOST | Server host | 0.0.0.0 |
| SERVER_PORT | Server port | 8080 |
| SERVER_READ_TIMEOUT | Read timeout | 30s |
| SERVER_WRITE_TIMEOUT | Write timeout | 30s |
| SERVER_IDLE_TIMEOUT | Idle timeout | 120s |
| SERVER_TLS_CERT_PATH | TLS certificate path | "" |
| SERVER_TLS_KEY_PATH | TLS key path | "" |
For production deployments, it's strongly recommended to configure TLS:
SERVER_TLS_CERT_PATH=/path/to/certificate.pem
SERVER_TLS_KEY_PATH=/path/to/private-key.pem
Client Settings
These settings control how Inference Gateway connects to third-party APIs:
| Variable | Description | Default |
|---|---|---|
| CLIENT_TIMEOUT | Client timeout | 30s |
| CLIENT_MAX_IDLE_CONNS | Maximum idle connections | 20 |
| CLIENT_MAX_IDLE_CONNS_PER_HOST | Maximum idle connections per host | 20 |
| CLIENT_IDLE_CONN_TIMEOUT | Idle connection timeout | 30s |
| CLIENT_TLS_MIN_VERSION | Minimum TLS version | TLS12 |
For high-throughput deployments, consider increasing the connection pool settings:
CLIENT_MAX_IDLE_CONNS=100
CLIENT_MAX_IDLE_CONNS_PER_HOST=50
Provider Settings
Configure access to various LLM providers. At minimum, you should configure the providers you plan to use.
OpenAI
| Variable | Description | Default |
|---|---|---|
| OPENAI_API_URL | OpenAI API URL | https://api.openai.com/v1 |
| OPENAI_API_KEY | OpenAI API Key | "" |
Anthropic
| Variable | Description | Default |
|---|---|---|
| ANTHROPIC_API_URL | Anthropic API URL | https://api.anthropic.com/v1 |
| ANTHROPIC_API_KEY | Anthropic API Key | "" |
Cohere
| Variable | Description | Default |
|---|---|---|
| COHERE_API_URL | Cohere API URL | https://api.cohere.com |
| COHERE_API_KEY | Cohere API Key | "" |
Groq
| Variable | Description | Default |
|---|---|---|
| GROQ_API_URL | Groq API URL | https://api.groq.com/openai/v1 |
| GROQ_API_KEY | Groq API Key | "" |
Ollama
| Variable | Description | Default |
|---|---|---|
| OLLAMA_API_URL | Ollama API URL | http://ollama:8080/v1 |
| OLLAMA_API_KEY | Ollama API Key | "" |
| OLLAMA_CLOUD_API_URL | Ollama Cloud API URL | https://ollama.com/v1 |
| OLLAMA_CLOUD_API_KEY | Ollama Cloud API Key | "" |
Cloudflare
| Variable | Description | Default |
|---|---|---|
| CLOUDFLARE_API_URL | Cloudflare API URL | https://api.cloudflare.com/client/v4/accounts/ACCOUNT_ID/ai |
| CLOUDFLARE_API_KEY | Cloudflare API Key | "" |
DeepSeek
| Variable | Description | Default |
|---|---|---|
| DEEPSEEK_API_URL | DeepSeek API URL | https://api.deepseek.com |
| DEEPSEEK_API_KEY | DeepSeek API Key | "" |
| Variable | Description | Default |
|---|---|---|
| GOOGLE_API_URL | Google AI API URL | https://generativelanguage.googleapis.com/v1 |
| GOOGLE_API_KEY | Google AI API Key | "" |
Model Context Protocol (MCP) Settings
These settings control MCP integration for external tool access:
| Variable | Description | Default |
|---|---|---|
| MCP_ENABLE | Enable MCP middleware | false |
| MCP_EXPOSE | Expose MCP endpoints for debugging | false |
| MCP_SERVERS | Comma-separated list of MCP server URLs | "" |
| MCP_CLIENT_TIMEOUT | MCP client timeout | 10s |
| MCP_DIAL_TIMEOUT | MCP dial timeout | 5s |
| MCP_TLS_HANDSHAKE_TIMEOUT | MCP TLS handshake timeout | 5s |
| MCP_RESPONSE_HEADER_TIMEOUT | MCP response header timeout | 5s |
| MCP_EXPECT_CONTINUE_TIMEOUT | MCP expect continue timeout | 2s |
| MCP_REQUEST_TIMEOUT | MCP request timeout | 10s |
UI Settings
These settings control the Inference Gateway UI:
| Variable | Description | Default |
|---|---|---|
| INFERENCE_GATEWAY_URL | The URL of the Inference Gateway server | http://localhost:8080/v1 |
Logging and Debugging
These settings control logging and debugging behavior:
| Variable | Description | Default |
|---|---|---|
| LOG_LEVEL | Set logging level (debug, info, warn, error) | info |
Environment Variable File (.env)
For local development, you can use a .env file. Create a file named .env in your project root:
# .env file example
ENVIRONMENT=development
TELEMETRY_ENABLE=false
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
Kubernetes ConfigMaps and Secrets
When deploying in Kubernetes, use ConfigMaps for non-sensitive configuration and Secrets for API keys and other sensitive information.
Example ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: inference-gateway-config
data:
ENVIRONMENT: 'production'
TELEMETRY_ENABLE: 'true'
SERVER_HOST: '0.0.0.0'
SERVER_PORT: '8080'
SERVER_READ_TIMEOUT: '30s'
SERVER_WRITE_TIMEOUT: '30s'
SERVER_IDLE_TIMEOUT: '120s'
Example Secret
apiVersion: v1
kind: Secret
metadata:
name: inference-gateway-secrets
type: Opaque
data:
ANTHROPIC_API_KEY: '<base64-encoded-key>'
COHERE_API_KEY: '<base64-encoded-key>'
OPENAI_API_KEY: '<base64-encoded-key>'
OIDC_CLIENT_SECRET: '<base64-encoded-key>'
Complete Configuration Example
Here's a comprehensive example for configuring Inference Gateway in a production environment:
# General settings
ENVIRONMENT=production
ALLOWED_MODELS=
ENABLE_VISION=false
DEBUG_CONTENT_TRUNCATE_WORDS=10
DEBUG_MAX_MESSAGES=100
# Telemetry
TELEMETRY_ENABLE=false
TELEMETRY_METRICS_PORT=9464
# Model Context Protocol (MCP)
MCP_ENABLE=false
MCP_EXPOSE=false
MCP_SERVERS=
MCP_CLIENT_TIMEOUT=5s
MCP_DIAL_TIMEOUT=3s
MCP_TLS_HANDSHAKE_TIMEOUT=3s
MCP_RESPONSE_HEADER_TIMEOUT=3s
MCP_EXPECT_CONTINUE_TIMEOUT=1s
MCP_REQUEST_TIMEOUT=5s
MCP_MAX_RETRIES=3
MCP_RETRY_INTERVAL=5s
MCP_INITIAL_BACKOFF=1s
MCP_ENABLE_RECONNECT=true
MCP_RECONNECT_INTERVAL=30s
MCP_POLLING_ENABLE=true
MCP_POLLING_INTERVAL=30s
MCP_POLLING_TIMEOUT=5s
MCP_DISABLE_HEALTHCHECK_LOGS=true
# Authentication
AUTH_ENABLE=false
AUTH_OIDC_ISSUER=http://keycloak:8080/realms/inference-gateway-realm
AUTH_OIDC_CLIENT_ID=inference-gateway-client
AUTH_OIDC_CLIENT_SECRET=
# Server settings
SERVER_HOST=0.0.0.0
SERVER_PORT=8080
SERVER_READ_TIMEOUT=30s
SERVER_WRITE_TIMEOUT=30s
SERVER_IDLE_TIMEOUT=120s
SERVER_TLS_CERT_PATH=
SERVER_TLS_KEY_PATH=
# Client settings
CLIENT_TIMEOUT=30s
CLIENT_MAX_IDLE_CONNS=20
CLIENT_MAX_IDLE_CONNS_PER_HOST=20
CLIENT_IDLE_CONN_TIMEOUT=30s
CLIENT_TLS_MIN_VERSION=TLS12
CLIENT_DISABLE_COMPRESSION=true
CLIENT_RESPONSE_HEADER_TIMEOUT=10s
CLIENT_EXPECT_CONTINUE_TIMEOUT=1s
# Providers
ANTHROPIC_API_URL=https://api.anthropic.com/v1
ANTHROPIC_API_KEY=
CLOUDFLARE_API_URL=https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai
CLOUDFLARE_API_KEY=
COHERE_API_URL=https://api.cohere.ai
COHERE_API_KEY=
GROQ_API_URL=https://api.groq.com/openai/v1
GROQ_API_KEY=
OLLAMA_API_URL=http://ollama:8080/v1
OLLAMA_API_KEY=
OLLAMA_CLOUD_API_URL=https://ollama.com/v1
OLLAMA_CLOUD_API_KEY=
OPENAI_API_URL=https://api.openai.com/v1
OPENAI_API_KEY=
DEEPSEEK_API_URL=https://api.deepseek.com
DEEPSEEK_API_KEY=
GOOGLE_API_URL=https://generativelanguage.googleapis.com/v1beta/openai
GOOGLE_API_KEY=
MISTRAL_API_URL=https://api.mistral.ai/v1
MISTRAL_API_KEY=
Configuration Best Practices
- API Key Security: Never commit API keys to version control. Use environment variables or secrets management.
- TLS in Production: Always use TLS in production environments to secure data in transit.
- Authentication: Enable authentication in production environments to control access.
- Timeouts: Adjust timeouts based on your expected workloads and response times from LLM providers.
- Monitoring: Enable telemetry in production for observability and performance tracking.
Next Steps
Once you've configured Inference Gateway, you might want to:
- Check out the API Reference for details on available endpoints
- Explore SDK options for integrating with your application
- Review Observability options for monitoring and logging