API Reference
Inference Gateway provides a RESTful API for interacting with language models from various providers. This reference documents all available endpoints, request formats, and response structures.
Base URL
All API endpoints are relative to your Inference Gateway installation URL.
By default, this is http://localhost:8080
when running locally.
Authentication
If authentication is enabled (ENABLE_AUTH=true
), all requests must include a bearer token:
Authorization: Bearer YOUR_JWT_TOKEN
The JWT token is issued by the configured Identity Provider (IdP) as specified in the OpenID Connect settings
(OIDC_ISSUER_URL
, OIDC_CLIENT_ID
, etc.). Inference Gateway validates these tokens
against the IdP to authenticate requests.
API Endpoints
List All Models
Get a list of all available language models across all configured providers.
GET /v1/models
Response:
Status: 200 OK
Content-Type: application/json
{
"object": "list",
"data": [
{
"id": "gpt-4",
"object": "model",
"created": 1741879542,
"owned_by": "openai",
"served_by": "openai",
},
{
"id": "claude-3-opus-20240229",
"object": "model",
"created": 1741879542,
"owned_by": "anthropic",
"served_by": "anthropic",
},
{
"id": "deepseek-r1-distill-llama-70b",
"object": "model",
"created": 1741879542,
"owned_by": "Meta",
"served_by": "groq",
}
...
]
}
List Provider Models
Get a list of available models for a specific provider.
GET /v1/models?provider={provider}
where {provider}
is one of: openai
, anthropic
, cohere
, groq
, cloudflare
, ollama
, deepseek
.
Response:
Status: 200 OK
Content-Type: application/json
{
"provider": "openai",
"object": "list",
"data": [
{
"id": "gpt-4",
"object": "model",
"created": 1741879542,
"owned_by": "openai",
"served_by": "openai",
},
{
"id": "gpt-3.5-turbo",
"object": "model",
"created": 1741879542,
"owned_by": "openai",
"served_by": "openai",
}
]
}
Chat Completions
Chat Completions using a specific provider's language model.
POST /v1/chat/completions?provider={provider}
Request Body
{
"model": "ollama/deepseek-r1:1.5b", // Model name (e.g., "gpt-4", "claude-3-opus-20240229")
"messages": [ // Array of messages in the conversation
{
"role": "system" | "user" | "assistant" | "tool", // Role of the message sender
"content": "Hi, how are you doing today?" // Message content
}
],
"stream": false, // Optional: Stream tokens as they're generated (default: false)
"max_tokens": 1000, // Optional: Maximum number of tokens to generate
"tools": [ // Optional: Tools the model can use
{
"type": "function",
"name": "string",
"description": "string",
"parameters": {
// JSON Schema object defining function parameters
}
}
]
}
Response:
Status: 200 OK
Content-Type: application/json
{
"id": "chatcmpl-753",
"object": "chat.completion",
"created": 1741879542,
"model": "deepseek-r1:1.5b",
"choices": [
{
"index": 0,
"message": {
"content": "<think>\nOkay, so the user greeted me and said \"Hi, how are you doing today?\" They're just starting to say that. I should respond in a friendly way.\n\nMaybe I can acknowledge their greeting and offer my help with something. Since they mentioned working on math problems or solving puzzles, I'll stick to that.\n\nI want to make sure I'm approaching it the right way. It's not a question yet, but if I see more of them, maybe I can offer more assistance. So responding with an emoji like 😊 would be nice.\n</think>\n\nHello! How are you doing today? 😊",
"role": "assistant"
},
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 40,
"completion_tokens": 40,
"total_tokens": 80
}
}
Streaming Response
When stream: true
is specified, responses are streamed as Server Sent Events(SSE) objects:
curl -X POST http://localhost:8080/v1/chat/completions\?provider\=ollama -d '{
"model": "deepseek-r1:1.5b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hi, how are you doing today?"
}
],
"stream": true,
"max_tokens": 40
}'
Response:
Status: 200 OK
Content-Type: text/event-stream
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481679,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"\u003cthink\u003e"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481680,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"\n"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481680,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"Okay"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481680,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":","},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481680,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" so"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481681,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481681,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" need"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481685,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" to"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481685,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" figure"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481685,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" out"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" what"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" best"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" answer"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" is"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" for"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" question"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481688,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" \""},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481688,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"How"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" should"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" handle"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" these"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" deep"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" conversations"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" over"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" phone"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"?\""},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" \n\n"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"Let"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" me"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" start"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" by"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" understanding"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" what"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"'s"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" being"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" asked"},"finish_reason":null}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":"length"}]}
data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[],"usage":{"prompt_tokens":17,"completion_tokens":40,"total_tokens":57}}
data: [DONE]
Note that the final message contains the usage metrics of the token completion.
Proxy Requests
Pass requests directly through to provider APIs.
{METHOD} /proxy/{provider}/{path}
Where:
{METHOD}
is any HTTP method (GET, POST, PUT, DELETE, PATCH){provider}
is one of the supported providers{path}
is the path to proxy to the provider API
Example: OpenAI Chat Completion
POST /proxy/openai/v1/chat/completions
Content-Type: application/json
{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Hello! How can I assist you today?"
}
],
"temperature": 0.7
}
Health Check
Check if the Inference Gateway service is running.
GET /health
Response:
Status: 200 OK
Error Responses
When an error occurs, the API returns an appropriate HTTP status code with an error message:
{
"error": "Error message description"
}
Common error status codes
400 Bad Request
- Invalid request parameters401 Unauthorized
- Missing or invalid authentication500 Internal Server Error
- Server-side error
Advanced Features
Streaming Responses
You can stream responses from supported providers by setting the stream
parameter to true
:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Write a story about a space explorer."
}
],
"stream": true
}'
Tool Use
For providers that support function calling (like OpenAI and Anthropic), you can use the tools
parameter:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the weather in Paris?"
}
],
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get the current weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
]
}'
Direct API Proxy
For more advanced use cases, you can proxy requests directly to the provider's API:
curl -X POST http://localhost:8080/proxy/openai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
}'
OpenAPI Specification
For a complete API reference in OpenAPI format, you can view the
OpenAPI specification file