Configuration
Inference Gateway provides flexible configuration options to adapt to your specific needs. As a proxy server designed to facilitate access to various language model APIs, proper configuration is essential for optimal performance and security.
Configuration Methods
Inference Gateway supports multiple configuration methods to suit different deployment scenarios:
- Environment Variables - Recommended for most deployments
- Kubernetes ConfigMaps and Secrets - For Kubernetes-based deployments
- Configuration Files - For local development and testing
Environment Variables
Environment variables are the primary method for configuring Inference Gateway. These variables control everything from basic server settings to provider-specific API configurations.
General Settings
Variable | Description | Default |
---|---|---|
ENVIRONMENT | Deployment environment | production |
ENABLE_TELEMETRY | Enable OpenTelemetry metrics and tracing | false |
ENABLE_AUTH | Enable OIDC authentication | false |
When ENABLE_TELEMETRY
is set to true
, Inference Gateway exposes a /metrics
endpoint for Prometheus scraping and generates distributed traces that can be collected by OpenTelemetry collectors.
OpenID Connect
If authentication is enabled (ENABLE_AUTH=true
), configure the following OIDC settings:
Variable | Description | Default |
---|---|---|
OIDC_ISSUER_URL | OIDC issuer URL | http://keycloak:8080/realms/inference-gateway-realm |
OIDC_CLIENT_ID | OIDC client ID | inference-gateway-client |
OIDC_CLIENT_SECRET | OIDC client secret | "" |
When authentication is enabled, all API requests must include a valid JWT token in the Authorization header:
Authorization: Bearer YOUR_JWT_TOKEN
Server Settings
These settings control the core HTTP server behavior:
Variable | Description | Default |
---|---|---|
SERVER_HOST | Server host | 0.0.0.0 |
SERVER_PORT | Server port | 8080 |
SERVER_READ_TIMEOUT | Read timeout | 30s |
SERVER_WRITE_TIMEOUT | Write timeout | 30s |
SERVER_IDLE_TIMEOUT | Idle timeout | 120s |
SERVER_TLS_CERT_PATH | TLS certificate path | "" |
SERVER_TLS_KEY_PATH | TLS key path | "" |
For production deployments, it's strongly recommended to configure TLS:
SERVER_TLS_CERT_PATH=/path/to/certificate.pem
SERVER_TLS_KEY_PATH=/path/to/private-key.pem
Client Settings
These settings control how Inference Gateway connects to third-party APIs:
Variable | Description | Default |
---|---|---|
CLIENT_TIMEOUT | Client timeout | 30s |
CLIENT_MAX_IDLE_CONNS | Maximum idle connections | 20 |
CLIENT_MAX_IDLE_CONNS_PER_HOST | Maximum idle connections per host | 20 |
CLIENT_IDLE_CONN_TIMEOUT | Idle connection timeout | 30s |
CLIENT_TLS_MIN_VERSION | Minimum TLS version | TLS12 |
For high-throughput deployments, consider increasing the connection pool settings:
CLIENT_MAX_IDLE_CONNS=100
CLIENT_MAX_IDLE_CONNS_PER_HOST=50
Provider Settings
Configure access to various LLM providers. At minimum, you should configure the providers you plan to use.
OpenAI
Variable | Description | Default |
---|---|---|
OPENAI_API_URL | OpenAI API URL | https://api.openai.com/v1 |
OPENAI_API_KEY | OpenAI API Key | "" |
Anthropic
Variable | Description | Default |
---|---|---|
ANTHROPIC_API_URL | Anthropic API URL | https://api.anthropic.com/v1 |
ANTHROPIC_API_KEY | Anthropic API Key | "" |
Cohere
Variable | Description | Default |
---|---|---|
COHERE_API_URL | Cohere API URL | https://api.cohere.com |
COHERE_API_KEY | Cohere API Key | "" |
Groq
Variable | Description | Default |
---|---|---|
GROQ_API_URL | Groq API URL | https://api.groq.com/openai/v1 |
GROQ_API_KEY | Groq API Key | "" |
Ollama
Variable | Description | Default |
---|---|---|
OLLAMA_API_URL | Ollama API URL | http://ollama:8080/v1 |
OLLAMA_API_KEY | Ollama API Key | "" |
Cloudflare
Variable | Description | Default |
---|---|---|
CLOUDFLARE_API_URL | Cloudflare API URL | https://api.cloudflare.com/client/v4/accounts/ACCOUNT_ID/ai |
CLOUDFLARE_API_KEY | Cloudflare API Key | "" |
DeepSeek
Variable | Description | Default |
---|---|---|
DEEPSEEK_API_URL | DeepSeek API URL | https://api.deepseek.com |
DEEPSEEK_API_KEY | DeepSeek API Key | "" |
Environment Variable File (.env)
For local development, you can use a .env
file. Create a file named .env
in your project root:
# .env file example
ENVIRONMENT=development
ENABLE_TELEMETRY=false
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
Kubernetes ConfigMaps and Secrets
When deploying in Kubernetes, use ConfigMaps for non-sensitive configuration and Secrets for API keys and other sensitive information.
Example ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: inference-gateway-config
data:
ENVIRONMENT: 'production'
ENABLE_TELEMETRY: 'true'
SERVER_HOST: '0.0.0.0'
SERVER_PORT: '8080'
SERVER_READ_TIMEOUT: '30s'
SERVER_WRITE_TIMEOUT: '30s'
SERVER_IDLE_TIMEOUT: '120s'
Example Secret
apiVersion: v1
kind: Secret
metadata:
name: inference-gateway-secrets
type: Opaque
data:
ANTHROPIC_API_KEY: '<base64-encoded-key>'
COHERE_API_KEY: '<base64-encoded-key>'
OPENAI_API_KEY: '<base64-encoded-key>'
OIDC_CLIENT_SECRET: '<base64-encoded-key>'
Complete Configuration Example
Here's a comprehensive example for configuring Inference Gateway in a production environment:
# General settings
ENVIRONMENT=production
ENABLE_TELEMETRY=true
ENABLE_AUTH=true
# Authentication
OIDC_ISSUER_URL=https://auth.example.com/realms/inference-gateway
OIDC_CLIENT_ID=inference-gateway
OIDC_CLIENT_SECRET=your-client-secret
# Server settings
SERVER_HOST=0.0.0.0
SERVER_PORT=8080
SERVER_READ_TIMEOUT=30s
SERVER_WRITE_TIMEOUT=30s
SERVER_IDLE_TIMEOUT=120s
SERVER_TLS_CERT_PATH=/certs/tls.crt
SERVER_TLS_KEY_PATH=/certs/tls.key
# Client settings
CLIENT_TIMEOUT=45s
CLIENT_MAX_IDLE_CONNS=100
CLIENT_MAX_IDLE_CONNS_PER_HOST=50
CLIENT_IDLE_CONN_TIMEOUT=60s
CLIENT_TLS_MIN_VERSION=TLS12
# Provider settings
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
GROQ_API_KEY=your-groq-api-key
Configuration Best Practices
- API Key Security: Never commit API keys to version control. Use environment variables or secrets management.
- TLS in Production: Always use TLS in production environments to secure data in transit.
- Authentication: Enable authentication in production environments to control access.
- Timeouts: Adjust timeouts based on your expected workloads and response times from LLM providers.
- Monitoring: Enable telemetry in production for observability and performance tracking.
Next Steps
Once you've configured Inference Gateway, you might want to:
- Check out the API Reference for details on available endpoints
- Explore SDK options for integrating with your application
- Review Observability options for monitoring and logging