Supported Providers
Inference Gateway provides a unified interface to interact with multiple LLM providers. This page details each supported provider, their configuration, and usage examples.
Available Providers
The following LLM providers are currently supported:
OpenAI
Access GPT models including GPT-3.5, GPT-4, and more.
Authentication: Bearer Token
Default URL: https://api.openai.com/v1
DeepSeek
Use DeepSeek's models for various natural language tasks.
Authentication: Bearer Token
Default URL: https://api.deepseek.com
Anthropic
Connect to Claude models for high-quality conversational AI.
Authentication: X-Header
Default URL: https://api.anthropic.com/v1
Cohere
Use Cohere's models for various natural language tasks.
Authentication: Bearer Token
Default URL: https://api.cohere.com
Groq
Access high-performance inference with Groq's LPU-accelerated models.
Authentication: Bearer Token
Default URL: https://api.groq.com/openai/v1
Cloudflare
Connect to Cloudflare Workers AI for inference on various models.
Authentication: Bearer Token
Default URL: https://api.cloudflare.com/client/v4/accounts/
{ACCOUNT_ID}/ai
Ollama
Run open-source models locally or on a self-hosted server.
Authentication: None (optional API key)
Default URL: http://ollama:8080/v1
Using Providers
Provider Configuration
Each provider requires specific configuration through environment variables:
PROVIDER_API_URL
: The base URL for the provider's APIPROVIDER_API_KEY
: The authentication key for the provider
Replace "PROVIDER" with the provider name (uppercase): OPENAI, ANTHROPIC, COHERE, GROQ, CLOUDFLARE, OLLAMA.
API Endpoints
Inference Gateway offers two main approaches to interact with providers:
1. Unified Generate API
The unified API allows you to generate content with a consistent interface across all providers:
POST /v1/chat/completions
Content-Type: application/json
{
"model": "MODEL_NAME",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, world!"
}
]
}
2. Provider Proxy
You can also proxy requests directly to the provider's native API:
POST /proxy/{provider}/{path}
Content-Type: application/json
// Provider-specific request body
Provider-Specific Examples
OpenAI Provider
Generate content with OpenAI models:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, world!"
}
]
}'
List all available models:
curl http://localhost:8080/v1/models
DeepSeek Provider
Generate content with DeepSeek models:
curl -X POST http://localhost:8080/v1/chat/completions?provider=deepseek \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-reasoner",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'
List available models:
curl http://localhost:8080/v1/models?provider=deepseek
Anthropic Provider
Generate content with Anthropic Claude models:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-opus-20240229",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain quantum computing in simple terms."
}
]
}'
List available models:
curl http://localhost:8080/v1/models?provider=anthropic
Cohere Provider
Generate content with Cohere models:
curl -X POST http://localhost:8080/v1/chat/completions?provider=cohere \
-H "Content-Type: application/json" \
-d '{
"model": "command",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Write a short poem about AI."
}
]
}'
Groq Provider
Generate content with Groq's high-performance models:
curl -X POST http://localhost:8080/v1/chat/completions?provider=groq \
-H "Content-Type: application/json" \
-d '{
"model": "llama2-70b-4096",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What are the benefits of quantum computing?"
}
]
}'
Cloudflare Provider
Generate content with Cloudflare Workers AI:
curl -X POST http://localhost:8080/v1/chat/completions?provider=cloudflare \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b-instruct",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain how neural networks work."
}
]
}'
Ollama Provider
Generate content with locally-hosted Ollama models:
curl -X POST http://localhost:8080/v1/chat/completions?provider=ollama \
-H "Content-Type: application/json" \
-d '{
"model": "llama2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Write a function to calculate Fibonacci numbers in Python."
}
]
}'