Configuration

Inference Gateway provides flexible configuration options to adapt to your specific needs. As a proxy server designed to facilitate access to various language model APIs, proper configuration is essential for optimal performance and security.

Configuration Methods

Inference Gateway supports multiple configuration methods to suit different deployment scenarios:

  1. Environment Variables - Recommended for most deployments
  2. Kubernetes ConfigMaps and Secrets - For Kubernetes-based deployments
  3. Configuration Files - For local development and testing

Environment Variables

Environment variables are the primary method for configuring Inference Gateway. These variables control everything from basic server settings to provider-specific API configurations.

General Settings

VariableDescriptionDefault
ENVIRONMENTDeployment environmentproduction
ENABLE_TELEMETRYEnable OpenTelemetry metrics and tracingfalse
ENABLE_AUTHEnable OIDC authenticationfalse

When ENABLE_TELEMETRY is set to true, Inference Gateway exposes a /metrics endpoint for Prometheus scraping and generates distributed traces that can be collected by OpenTelemetry collectors.

OpenID Connect

If authentication is enabled (ENABLE_AUTH=true), configure the following OIDC settings:

VariableDescriptionDefault
OIDC_ISSUER_URLOIDC issuer URLhttp://keycloak:8080/realms/inference-gateway-realm
OIDC_CLIENT_IDOIDC client IDinference-gateway-client
OIDC_CLIENT_SECRETOIDC client secret""

When authentication is enabled, all API requests must include a valid JWT token in the Authorization header:

HTTP
Authorization: Bearer YOUR_JWT_TOKEN

Server Settings

These settings control the core HTTP server behavior:

VariableDescriptionDefault
SERVER_HOSTServer host0.0.0.0
SERVER_PORTServer port8080
SERVER_READ_TIMEOUTRead timeout30s
SERVER_WRITE_TIMEOUTWrite timeout30s
SERVER_IDLE_TIMEOUTIdle timeout120s
SERVER_TLS_CERT_PATHTLS certificate path""
SERVER_TLS_KEY_PATHTLS key path""

For production deployments, it's strongly recommended to configure TLS:

Terminal
SERVER_TLS_CERT_PATH=/path/to/certificate.pem
SERVER_TLS_KEY_PATH=/path/to/private-key.pem

Client Settings

These settings control how Inference Gateway connects to third-party APIs:

VariableDescriptionDefault
CLIENT_TIMEOUTClient timeout30s
CLIENT_MAX_IDLE_CONNSMaximum idle connections20
CLIENT_MAX_IDLE_CONNS_PER_HOSTMaximum idle connections per host20
CLIENT_IDLE_CONN_TIMEOUTIdle connection timeout30s
CLIENT_TLS_MIN_VERSIONMinimum TLS versionTLS12

For high-throughput deployments, consider increasing the connection pool settings:

Terminal
CLIENT_MAX_IDLE_CONNS=100
CLIENT_MAX_IDLE_CONNS_PER_HOST=50

Provider Settings

Configure access to various LLM providers. At minimum, you should configure the providers you plan to use.

OpenAI

VariableDescriptionDefault
OPENAI_API_URLOpenAI API URLhttps://api.openai.com/v1
OPENAI_API_KEYOpenAI API Key""

Anthropic

VariableDescriptionDefault
ANTHROPIC_API_URLAnthropic API URLhttps://api.anthropic.com/v1
ANTHROPIC_API_KEYAnthropic API Key""

Cohere

VariableDescriptionDefault
COHERE_API_URLCohere API URLhttps://api.cohere.com
COHERE_API_KEYCohere API Key""

Groq

VariableDescriptionDefault
GROQ_API_URLGroq API URLhttps://api.groq.com/openai/v1
GROQ_API_KEYGroq API Key""

Ollama

VariableDescriptionDefault
OLLAMA_API_URLOllama API URLhttp://ollama:8080/v1
OLLAMA_API_KEYOllama API Key""

Cloudflare

VariableDescriptionDefault
CLOUDFLARE_API_URLCloudflare API URLhttps://api.cloudflare.com/client/v4/accounts/ACCOUNT_ID/ai
CLOUDFLARE_API_KEYCloudflare API Key""

DeepSeek

VariableDescriptionDefault
DEEPSEEK_API_URLDeepSeek API URLhttps://api.deepseek.com
DEEPSEEK_API_KEYDeepSeek API Key""

Environment Variable File (.env)

For local development, you can use a .env file. Create a file named .env in your project root:

Terminal
# .env file example
ENVIRONMENT=development
ENABLE_TELEMETRY=false
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key

Kubernetes ConfigMaps and Secrets

When deploying in Kubernetes, use ConfigMaps for non-sensitive configuration and Secrets for API keys and other sensitive information.

Example ConfigMap

YAML
apiVersion: v1
kind: ConfigMap
metadata:
  name: inference-gateway-config
data:
  ENVIRONMENT: 'production'
  ENABLE_TELEMETRY: 'true'
  SERVER_HOST: '0.0.0.0'
  SERVER_PORT: '8080'
  SERVER_READ_TIMEOUT: '30s'
  SERVER_WRITE_TIMEOUT: '30s'
  SERVER_IDLE_TIMEOUT: '120s'

Example Secret

YAML
apiVersion: v1
kind: Secret
metadata:
  name: inference-gateway-secrets
type: Opaque
data:
  ANTHROPIC_API_KEY: '<base64-encoded-key>'
  COHERE_API_KEY: '<base64-encoded-key>'
  OPENAI_API_KEY: '<base64-encoded-key>'
  OIDC_CLIENT_SECRET: '<base64-encoded-key>'

Complete Configuration Example

Here's a comprehensive example for configuring Inference Gateway in a production environment:

Terminal
# General settings
ENVIRONMENT=production
ENABLE_TELEMETRY=true
ENABLE_AUTH=true

# Authentication
OIDC_ISSUER_URL=https://auth.example.com/realms/inference-gateway
OIDC_CLIENT_ID=inference-gateway
OIDC_CLIENT_SECRET=your-client-secret

# Server settings
SERVER_HOST=0.0.0.0
SERVER_PORT=8080
SERVER_READ_TIMEOUT=30s
SERVER_WRITE_TIMEOUT=30s
SERVER_IDLE_TIMEOUT=120s
SERVER_TLS_CERT_PATH=/certs/tls.crt
SERVER_TLS_KEY_PATH=/certs/tls.key

# Client settings
CLIENT_TIMEOUT=45s
CLIENT_MAX_IDLE_CONNS=100
CLIENT_MAX_IDLE_CONNS_PER_HOST=50
CLIENT_IDLE_CONN_TIMEOUT=60s
CLIENT_TLS_MIN_VERSION=TLS12

# Provider settings
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
GROQ_API_KEY=your-groq-api-key

Configuration Best Practices

  1. API Key Security: Never commit API keys to version control. Use environment variables or secrets management.
  2. TLS in Production: Always use TLS in production environments to secure data in transit.
  3. Authentication: Enable authentication in production environments to control access.
  4. Timeouts: Adjust timeouts based on your expected workloads and response times from LLM providers.
  5. Monitoring: Enable telemetry in production for observability and performance tracking.

Next Steps

Once you've configured Inference Gateway, you might want to: