Configuration

Inference Gateway provides flexible configuration options to adapt to your specific needs. As a proxy server designed to facilitate access to various language model APIs, proper configuration is essential for optimal performance and security.

Configuration Methods

Inference Gateway supports multiple configuration methods to suit different deployment scenarios:

Environment Variables - Recommended for most deployments
Kubernetes ConfigMaps and Secrets - For Kubernetes-based deployments
Configuration Files - For local development and testing

Environment Variables

Environment variables are the primary method for configuring Inference Gateway. These variables control everything from basic server settings to provider-specific API configurations.

General Settings

Variable	Description	Default
ENVIRONMENT	Deployment environment	production
ENABLE_TELEMETRY	Enable OpenTelemetry metrics and tracing	false
ENABLE_AUTH	Enable OIDC authentication	false

When ENABLE_TELEMETRY is set to true, Inference Gateway exposes a /metrics endpoint for Prometheus scraping and generates distributed traces that can be collected by OpenTelemetry collectors.

OpenID Connect

If authentication is enabled (ENABLE_AUTH=true), configure the following OIDC settings:

Variable	Description	Default
OIDC_ISSUER_URL	OIDC issuer URL	http://keycloak:8080/realms/inference-gateway-realm
OIDC_CLIENT_ID	OIDC client ID	inference-gateway-client
OIDC_CLIENT_SECRET	OIDC client secret	""

When authentication is enabled, all API requests must include a valid JWT token in the Authorization header:

HTTP

Authorization: Bearer YOUR_JWT_TOKEN

Server Settings

These settings control the core HTTP server behavior:

Variable	Description	Default
SERVER_HOST	Server host	0.0.0.0
SERVER_PORT	Server port	8080
SERVER_READ_TIMEOUT	Read timeout	30s
SERVER_WRITE_TIMEOUT	Write timeout	30s
SERVER_IDLE_TIMEOUT	Idle timeout	120s
SERVER_TLS_CERT_PATH	TLS certificate path	""
SERVER_TLS_KEY_PATH	TLS key path	""

For production deployments, it's strongly recommended to configure TLS:

Terminal

SERVER_TLS_CERT_PATH=/path/to/certificate.pem
SERVER_TLS_KEY_PATH=/path/to/private-key.pem

Client Settings

These settings control how Inference Gateway connects to third-party APIs:

Variable	Description	Default
CLIENT_TIMEOUT	Client timeout	30s
CLIENT_MAX_IDLE_CONNS	Maximum idle connections	20
CLIENT_MAX_IDLE_CONNS_PER_HOST	Maximum idle connections per host	20
CLIENT_IDLE_CONN_TIMEOUT	Idle connection timeout	30s
CLIENT_TLS_MIN_VERSION	Minimum TLS version	TLS12

For high-throughput deployments, consider increasing the connection pool settings:

Terminal

CLIENT_MAX_IDLE_CONNS=100
CLIENT_MAX_IDLE_CONNS_PER_HOST=50

Provider Settings

Configure access to various LLM providers. At minimum, you should configure the providers you plan to use.

OpenAI

Variable	Description	Default
OPENAI_API_URL	OpenAI API URL	https://api.openai.com/v1
OPENAI_API_KEY	OpenAI API Key	""

Anthropic

Variable	Description	Default
ANTHROPIC_API_URL	Anthropic API URL	https://api.anthropic.com/v1
ANTHROPIC_API_KEY	Anthropic API Key	""

Cohere

Variable	Description	Default
COHERE_API_URL	Cohere API URL	https://api.cohere.com
COHERE_API_KEY	Cohere API Key	""

Groq

Variable	Description	Default
GROQ_API_URL	Groq API URL	https://api.groq.com/openai/v1
GROQ_API_KEY	Groq API Key	""

Ollama

Variable	Description	Default
OLLAMA_API_URL	Ollama API URL	http://ollama:8080/v1
OLLAMA_API_KEY	Ollama API Key	""

Cloudflare

Variable	Description	Default
CLOUDFLARE_API_URL	Cloudflare API URL	https://api.cloudflare.com/client/v4/accounts/ACCOUNT_ID/ai
CLOUDFLARE_API_KEY	Cloudflare API Key	""

DeepSeek

Variable	Description	Default
DEEPSEEK_API_URL	DeepSeek API URL	https://api.deepseek.com
DEEPSEEK_API_KEY	DeepSeek API Key	""

Environment Variable File (.env)

For local development, you can use a .env file. Create a file named .env in your project root:

Terminal

# .env file example
ENVIRONMENT=development
ENABLE_TELEMETRY=false
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key

Kubernetes ConfigMaps and Secrets

When deploying in Kubernetes, use ConfigMaps for non-sensitive configuration and Secrets for API keys and other sensitive information.

Example ConfigMap

YAML

apiVersion: v1
kind: ConfigMap
metadata:
  name: inference-gateway-config
data:
  ENVIRONMENT: 'production'
  ENABLE_TELEMETRY: 'true'
  SERVER_HOST: '0.0.0.0'
  SERVER_PORT: '8080'
  SERVER_READ_TIMEOUT: '30s'
  SERVER_WRITE_TIMEOUT: '30s'
  SERVER_IDLE_TIMEOUT: '120s'

Example Secret

YAML

apiVersion: v1
kind: Secret
metadata:
  name: inference-gateway-secrets
type: Opaque
data:
  ANTHROPIC_API_KEY: '<base64-encoded-key>'
  COHERE_API_KEY: '<base64-encoded-key>'
  OPENAI_API_KEY: '<base64-encoded-key>'
  OIDC_CLIENT_SECRET: '<base64-encoded-key>'

Complete Configuration Example

Here's a comprehensive example for configuring Inference Gateway in a production environment:

Terminal

# General settings
ENVIRONMENT=production
ENABLE_TELEMETRY=true
ENABLE_AUTH=true

# Authentication
OIDC_ISSUER_URL=https://auth.example.com/realms/inference-gateway
OIDC_CLIENT_ID=inference-gateway
OIDC_CLIENT_SECRET=your-client-secret

# Server settings
SERVER_HOST=0.0.0.0
SERVER_PORT=8080
SERVER_READ_TIMEOUT=30s
SERVER_WRITE_TIMEOUT=30s
SERVER_IDLE_TIMEOUT=120s
SERVER_TLS_CERT_PATH=/certs/tls.crt
SERVER_TLS_KEY_PATH=/certs/tls.key

# Client settings
CLIENT_TIMEOUT=45s
CLIENT_MAX_IDLE_CONNS=100
CLIENT_MAX_IDLE_CONNS_PER_HOST=50
CLIENT_IDLE_CONN_TIMEOUT=60s
CLIENT_TLS_MIN_VERSION=TLS12

# Provider settings
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
GROQ_API_KEY=your-groq-api-key

Configuration Best Practices

API Key Security: Never commit API keys to version control. Use environment variables or secrets management.
TLS in Production: Always use TLS in production environments to secure data in transit.
Authentication: Enable authentication in production environments to control access.
Timeouts: Adjust timeouts based on your expected workloads and response times from LLM providers.
Monitoring: Enable telemetry in production for observability and performance tracking.

Next Steps

Once you've configured Inference Gateway, you might want to:

Check out the API Reference for details on available endpoints
Explore SDK options for integrating with your application
Review Observability options for monitoring and logging