Configuration

Inference Gateway provides flexible configuration options to adapt to your specific needs. As a proxy server designed to facilitate access to various language model APIs, proper configuration is essential for optimal performance and security.

Configuration Methods

Inference Gateway supports multiple configuration methods to suit different deployment scenarios:

  1. Environment Variables - Recommended for most deployments
  2. Kubernetes ConfigMaps and Secrets - For Kubernetes-based deployments
  3. Configuration Files - For local development and testing

Environment Variables

Environment variables are the primary method for configuring Inference Gateway. These variables control everything from basic server settings to provider-specific API configurations.

General Settings

VariableDescriptionDefault
ENVIRONMENTDeployment environmentproduction
ENABLE_VISIONEnable vision/multimodal support for all providersfalse
TELEMETRY_ENABLEEnable OpenTelemetry metrics and tracingfalse
AUTH_ENABLEEnable OIDC authenticationfalse

When ENABLE_VISION is set to true, Inference Gateway enables vision/multimodal capabilities, allowing you to send images alongside text in chat completion requests. When disabled (default), requests with image content will be rejected even if the provider and model support vision. This is disabled by default for performance and security reasons.

When TELEMETRY_ENABLE is set to true, Inference Gateway exposes a /metrics endpoint for Prometheus scraping and generates distributed traces that can be collected by OpenTelemetry collectors.

OpenID Connect

If authentication is enabled (AUTH_ENABLE=true), configure the following OIDC settings:

VariableDescriptionDefault
OIDC_ISSUER_URLOIDC issuer URLhttp://keycloak:8080/realms/inference-gateway-realm
OIDC_CLIENT_IDOIDC client IDinference-gateway-client
OIDC_CLIENT_SECRETOIDC client secret""

When authentication is enabled, all API requests must include a valid JWT token in the Authorization header:

HTTP
Authorization: Bearer YOUR_JWT_TOKEN

Server Settings

These settings control the core HTTP server behavior:

VariableDescriptionDefault
SERVER_HOSTServer host0.0.0.0
SERVER_PORTServer port8080
SERVER_READ_TIMEOUTRead timeout30s
SERVER_WRITE_TIMEOUTWrite timeout30s
SERVER_IDLE_TIMEOUTIdle timeout120s
SERVER_TLS_CERT_PATHTLS certificate path""
SERVER_TLS_KEY_PATHTLS key path""

For production deployments, it's strongly recommended to configure TLS:

Terminal
SERVER_TLS_CERT_PATH=/path/to/certificate.pem
SERVER_TLS_KEY_PATH=/path/to/private-key.pem

Client Settings

These settings control how Inference Gateway connects to third-party APIs:

VariableDescriptionDefault
CLIENT_TIMEOUTClient timeout30s
CLIENT_MAX_IDLE_CONNSMaximum idle connections20
CLIENT_MAX_IDLE_CONNS_PER_HOSTMaximum idle connections per host20
CLIENT_IDLE_CONN_TIMEOUTIdle connection timeout30s
CLIENT_TLS_MIN_VERSIONMinimum TLS versionTLS12

For high-throughput deployments, consider increasing the connection pool settings:

Terminal
CLIENT_MAX_IDLE_CONNS=100
CLIENT_MAX_IDLE_CONNS_PER_HOST=50

Provider Settings

Configure access to various LLM providers. At minimum, you should configure the providers you plan to use.

OpenAI

VariableDescriptionDefault
OPENAI_API_URLOpenAI API URLhttps://api.openai.com/v1
OPENAI_API_KEYOpenAI API Key""

Anthropic

VariableDescriptionDefault
ANTHROPIC_API_URLAnthropic API URLhttps://api.anthropic.com/v1
ANTHROPIC_API_KEYAnthropic API Key""

Cohere

VariableDescriptionDefault
COHERE_API_URLCohere API URLhttps://api.cohere.com
COHERE_API_KEYCohere API Key""

Groq

VariableDescriptionDefault
GROQ_API_URLGroq API URLhttps://api.groq.com/openai/v1
GROQ_API_KEYGroq API Key""

Ollama

VariableDescriptionDefault
OLLAMA_API_URLOllama API URLhttp://ollama:8080/v1
OLLAMA_API_KEYOllama API Key""
OLLAMA_CLOUD_API_URLOllama Cloud API URLhttps://ollama.com/v1
OLLAMA_CLOUD_API_KEYOllama Cloud API Key""

Cloudflare

VariableDescriptionDefault
CLOUDFLARE_API_URLCloudflare API URLhttps://api.cloudflare.com/client/v4/accounts/ACCOUNT_ID/ai
CLOUDFLARE_API_KEYCloudflare API Key""

DeepSeek

VariableDescriptionDefault
DEEPSEEK_API_URLDeepSeek API URLhttps://api.deepseek.com
DEEPSEEK_API_KEYDeepSeek API Key""

Google

VariableDescriptionDefault
GOOGLE_API_URLGoogle AI API URLhttps://generativelanguage.googleapis.com/v1
GOOGLE_API_KEYGoogle AI API Key""

Model Context Protocol (MCP) Settings

These settings control MCP integration for external tool access:

VariableDescriptionDefault
MCP_ENABLEEnable MCP middlewarefalse
MCP_EXPOSEExpose MCP endpoints for debuggingfalse
MCP_SERVERSComma-separated list of MCP server URLs""
MCP_CLIENT_TIMEOUTMCP client timeout10s
MCP_DIAL_TIMEOUTMCP dial timeout5s
MCP_TLS_HANDSHAKE_TIMEOUTMCP TLS handshake timeout5s
MCP_RESPONSE_HEADER_TIMEOUTMCP response header timeout5s
MCP_EXPECT_CONTINUE_TIMEOUTMCP expect continue timeout2s
MCP_REQUEST_TIMEOUTMCP request timeout10s

UI Settings

These settings control the Inference Gateway UI:

VariableDescriptionDefault
INFERENCE_GATEWAY_URLThe URL of the Inference Gateway serverhttp://localhost:8080/v1

Logging and Debugging

These settings control logging and debugging behavior:

VariableDescriptionDefault
LOG_LEVELSet logging level (debug, info, warn, error)info

Environment Variable File (.env)

For local development, you can use a .env file. Create a file named .env in your project root:

Terminal
# .env file example
ENVIRONMENT=development
TELEMETRY_ENABLE=false
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key

Kubernetes ConfigMaps and Secrets

When deploying in Kubernetes, use ConfigMaps for non-sensitive configuration and Secrets for API keys and other sensitive information.

Example ConfigMap

YAML
apiVersion: v1
kind: ConfigMap
metadata:
  name: inference-gateway-config
data:
  ENVIRONMENT: 'production'
  TELEMETRY_ENABLE: 'true'
  SERVER_HOST: '0.0.0.0'
  SERVER_PORT: '8080'
  SERVER_READ_TIMEOUT: '30s'
  SERVER_WRITE_TIMEOUT: '30s'
  SERVER_IDLE_TIMEOUT: '120s'

Example Secret

YAML
apiVersion: v1
kind: Secret
metadata:
  name: inference-gateway-secrets
type: Opaque
data:
  ANTHROPIC_API_KEY: '<base64-encoded-key>'
  COHERE_API_KEY: '<base64-encoded-key>'
  OPENAI_API_KEY: '<base64-encoded-key>'
  OIDC_CLIENT_SECRET: '<base64-encoded-key>'

Complete Configuration Example

Here's a comprehensive example for configuring Inference Gateway in a production environment:

Terminal

# General settings
ENVIRONMENT=production
ALLOWED_MODELS=
ENABLE_VISION=false
DEBUG_CONTENT_TRUNCATE_WORDS=10
DEBUG_MAX_MESSAGES=100
# Telemetry
TELEMETRY_ENABLE=false
TELEMETRY_METRICS_PORT=9464
# Model Context Protocol (MCP)
MCP_ENABLE=false
MCP_EXPOSE=false
MCP_SERVERS=
MCP_CLIENT_TIMEOUT=5s
MCP_DIAL_TIMEOUT=3s
MCP_TLS_HANDSHAKE_TIMEOUT=3s
MCP_RESPONSE_HEADER_TIMEOUT=3s
MCP_EXPECT_CONTINUE_TIMEOUT=1s
MCP_REQUEST_TIMEOUT=5s
MCP_MAX_RETRIES=3
MCP_RETRY_INTERVAL=5s
MCP_INITIAL_BACKOFF=1s
MCP_ENABLE_RECONNECT=true
MCP_RECONNECT_INTERVAL=30s
MCP_POLLING_ENABLE=true
MCP_POLLING_INTERVAL=30s
MCP_POLLING_TIMEOUT=5s
MCP_DISABLE_HEALTHCHECK_LOGS=true
# Authentication
AUTH_ENABLE=false
AUTH_OIDC_ISSUER=http://keycloak:8080/realms/inference-gateway-realm
AUTH_OIDC_CLIENT_ID=inference-gateway-client
AUTH_OIDC_CLIENT_SECRET=
# Server settings
SERVER_HOST=0.0.0.0
SERVER_PORT=8080
SERVER_READ_TIMEOUT=30s
SERVER_WRITE_TIMEOUT=30s
SERVER_IDLE_TIMEOUT=120s
SERVER_TLS_CERT_PATH=
SERVER_TLS_KEY_PATH=
# Client settings
CLIENT_TIMEOUT=30s
CLIENT_MAX_IDLE_CONNS=20
CLIENT_MAX_IDLE_CONNS_PER_HOST=20
CLIENT_IDLE_CONN_TIMEOUT=30s
CLIENT_TLS_MIN_VERSION=TLS12
CLIENT_DISABLE_COMPRESSION=true
CLIENT_RESPONSE_HEADER_TIMEOUT=10s
CLIENT_EXPECT_CONTINUE_TIMEOUT=1s
# Providers
ANTHROPIC_API_URL=https://api.anthropic.com/v1
ANTHROPIC_API_KEY=
CLOUDFLARE_API_URL=https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai
CLOUDFLARE_API_KEY=
COHERE_API_URL=https://api.cohere.ai
COHERE_API_KEY=
GROQ_API_URL=https://api.groq.com/openai/v1
GROQ_API_KEY=
OLLAMA_API_URL=http://ollama:8080/v1
OLLAMA_API_KEY=
OLLAMA_CLOUD_API_URL=https://ollama.com/v1
OLLAMA_CLOUD_API_KEY=
OPENAI_API_URL=https://api.openai.com/v1
OPENAI_API_KEY=
DEEPSEEK_API_URL=https://api.deepseek.com
DEEPSEEK_API_KEY=
GOOGLE_API_URL=https://generativelanguage.googleapis.com/v1beta/openai
GOOGLE_API_KEY=
MISTRAL_API_URL=https://api.mistral.ai/v1
MISTRAL_API_KEY=

Configuration Best Practices

  1. API Key Security: Never commit API keys to version control. Use environment variables or secrets management.
  2. TLS in Production: Always use TLS in production environments to secure data in transit.
  3. Authentication: Enable authentication in production environments to control access.
  4. Timeouts: Adjust timeouts based on your expected workloads and response times from LLM providers.
  5. Monitoring: Enable telemetry in production for observability and performance tracking.

Next Steps

Once you've configured Inference Gateway, you might want to: