Supported Providers
Inference Gateway provides a unified interface to interact with multiple LLM providers. This page details each supported provider, their configuration, and usage examples.
Available Providers
The following LLM providers are currently supported:
OpenAI
Access GPT models including GPT-5, GPT-5.2, GPT-4.1, and GPT-4o.
DeepSeek
Use DeepSeek's models for various natural language tasks.
Authentication: Bearer Token
Default URL: https://api.deepseek.com
Vision Support: ❌ No
Anthropic
Connect to Claude models for high-quality conversational AI.
Authentication: X-Header
Default URL: https://api.anthropic.com/v1
Vision Support: ✅ Yes (Claude Opus 4.5, Claude Sonnet 4, Claude Opus 4)
Cohere
Use Cohere's models for various natural language tasks.
Authentication: Bearer Token
Default URL: https://api.cohere.com
Vision Support: ✅ Yes (Command A Vision)
Groq
Access high-performance inference with Groq's LPU-accelerated models.
Authentication: Bearer Token
Default URL: https://api.groq.com/openai/v1
Vision Support: ✅ Yes (vision models)
Cloudflare
Connect to Cloudflare Workers AI for inference on various models.
Authentication: Bearer Token
Default URL: https://api.cloudflare.com/client/v4/accounts/
Vision Support: ❌ No
Ollama
Run open-source models locally or on a self-hosted server.
Authentication: None (optional API key)
Default URL: http://ollama:8080/v1
Vision Support: ✅ Yes (LLaVA, Llama 4, Llama 3.2 Vision)
Access Google's Gemini models for text generation and understanding.
Authentication: Bearer Token
Default URL: https://generativelanguage.googleapis.com/v1
Vision Support: ✅ Yes (Gemini 3 Flash, Gemini 3 Pro)
Mistral
Access Mistral AI's models including vision-capable Pixtral and Large 3.
Authentication: Bearer Token
Default URL: https://api.mistral.ai/v1
Vision Support: ✅ Yes (Pixtral Large, Ministral 3, Mistral Large 3)
Moonshot
Use Moonshot AI's Kimi models for natural language and vision tasks.
Vision/Multimodal Support
Several providers support vision/multimodal capabilities, allowing you to process images alongside text. To use vision features, you must enable them in your configuration:
ENABLE_VISION=true
Note: Vision support is disabled by default for performance and security reasons. When disabled, requests containing image content will be rejected even if the model supports vision.
Providers with Vision Support
- OpenAI: GPT-5 series, GPT-4.1, GPT-4o
- Anthropic: Claude Opus 4.5, Claude Sonnet 4, Claude Opus 4
- Google: Gemini 3 Flash, Gemini 3 Pro
- Cohere: Command A Vision
- Ollama: LLaVA, Llama 4, Llama 3.2 Vision
- Groq: Vision models
- Mistral: Pixtral Large, Ministral 3, Mistral Large 3
- Moonshot: Kimi K2, Kimi K2 Thinking
Example Vision Request
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-5-sonnet-20241022",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}'
Using Providers
Provider Configuration
Each provider requires specific configuration through environment variables:
PROVIDER_API_URL: The base URL for the provider's APIPROVIDER_API_KEY: The authentication key for the provider
Replace "PROVIDER" with the provider name (uppercase): OPENAI, ANTHROPIC, COHERE, GROQ, CLOUDFLARE, OLLAMA, GOOGLE, DEEPSEEK, MISTRAL, MOONSHOT.
API Endpoints
Inference Gateway offers two main approaches to interact with providers:
1. Unified Generate API
The unified API allows you to generate content with a consistent interface across all providers:
POST /v1/chat/completions
Content-Type: application/json
{
"model": "MODEL_NAME",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, world!"
}
]
}
2. Provider Proxy
You can also proxy requests directly to the provider's native API:
POST /proxy/{provider}/{path}
Content-Type: application/json
// Provider-specific request body
Provider-Specific Examples
OpenAI Provider
Generate content with OpenAI models:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, world!"
}
]
}'
List all available models:
curl http://localhost:8080/v1/models
DeepSeek Provider
Generate content with DeepSeek models:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-reasoner",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'
List available models:
curl http://localhost:8080/v1/models?provider=deepseek
Anthropic Provider
Generate content with Anthropic Claude models:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-20250514",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain quantum computing in simple terms."
}
]
}'
List available models:
curl http://localhost:8080/v1/models?provider=anthropic
Cohere Provider
Generate content with Cohere models:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/command-a-03-2025",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Write a short poem about AI."
}
]
}'
Groq Provider
Generate content with Groq's high-performance models:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "groq/llama-3.3-70b-versatile",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What are the benefits of quantum computing?"
}
]
}'
Cloudflare Provider
Generate content with Cloudflare Workers AI:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "cloudflare/@cf/meta/llama-3.3-70b-instruct-fp8-fast",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain how neural networks work."
}
]
}'
Ollama Provider
Generate content with locally-hosted Ollama models:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/llama3.3",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Write a function to calculate Fibonacci numbers in Python."
}
]
}'
Google Provider
Generate content with Google's Gemini models:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-3-flash",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain the concept of machine learning in simple terms."
}
]
}'
Mistral Provider
Generate content with Mistral AI models:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistral/mistral-large-3",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain the differences between supervised and unsupervised learning."
}
]
}'
Moonshot Provider
Generate content with Moonshot AI models:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "moonshot/kimi-k2-thinking",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What are the key principles of clean code?"
}
]
}'