API Reference

Inference Gateway provides a RESTful API for interacting with language models from various providers. This reference documents all available endpoints, request formats, and response structures.

Base URL

All API endpoints are relative to your Inference Gateway installation URL. By default, this is http://localhost:8080 when running locally.

Authentication

If authentication is enabled (ENABLE_AUTH=true), all requests must include a bearer token:

HTTP
Authorization: Bearer YOUR_JWT_TOKEN

The JWT token is issued by the configured Identity Provider (IdP) as specified in the OpenID Connect settings (OIDC_ISSUER_URL, OIDC_CLIENT_ID, etc.). Inference Gateway validates these tokens against the IdP to authenticate requests.

API Endpoints

List All Models

Get a list of all available language models across all configured providers.

HTTP
GET /v1/models

Response:

HTTP
Status: 200 OK
Content-Type: application/json

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4",
      "object": "model",
      "created": 1741879542,
      "owned_by": "openai",
      "served_by": "openai",
    },
    {
      "id": "claude-3-opus-20240229",
      "object": "model",
      "created": 1741879542,
      "owned_by": "anthropic",
      "served_by": "anthropic",
    },
    {
      "id": "deepseek-r1-distill-llama-70b",
      "object": "model",
      "created": 1741879542,
      "owned_by": "Meta",
      "served_by": "groq",
    }
    ...
  ]
}

List Provider Models

Get a list of available models for a specific provider.

HTTP
GET /v1/models?provider={provider}

where {provider} is one of: openai, anthropic, cohere, groq, cloudflare, ollama, deepseek.

Response:

HTTP
Status: 200 OK
Content-Type: application/json

{
  "provider": "openai",
  "object": "list",
  "data": [
    {
      "id": "gpt-4",
      "object": "model",
      "created": 1741879542,
      "owned_by": "openai",
      "served_by": "openai",
    },
    {
      "id": "gpt-3.5-turbo",
      "object": "model",
      "created": 1741879542,
      "owned_by": "openai",
      "served_by": "openai",
    }
  ]
}

Chat Completions

Chat Completions using a specific provider's language model.

HTTP
POST /v1/chat/completions?provider={provider}

Request Body

JSON
{
  "model": "ollama/deepseek-r1:1.5b",   // Model name (e.g., "gpt-4", "claude-3-opus-20240229")
  "messages": [        // Array of messages in the conversation
    {
      "role": "system" | "user" | "assistant" | "tool", // Role of the message sender
      "content": "Hi, how are you doing today?"         // Message content
    }
  ],
  "stream": false,     // Optional: Stream tokens as they're generated (default: false)
  "max_tokens": 1000,  // Optional: Maximum number of tokens to generate
  "tools": [           // Optional: Tools the model can use
    {
      "type": "function",
      "name": "string",
      "description": "string",
      "parameters": {
        // JSON Schema object defining function parameters
      }
    }
  ]
}

Response:

HTTP
Status: 200 OK
Content-Type: application/json

{
  "id": "chatcmpl-753",
  "object": "chat.completion",
  "created": 1741879542,
  "model": "deepseek-r1:1.5b",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "<think>\nOkay, so the user greeted me and said \"Hi, how are you doing today?\" They're just starting to say that. I should respond in a friendly way.\n\nMaybe I can acknowledge their greeting and offer my help with something. Since they mentioned working on math problems or solving puzzles, I'll stick to that.\n\nI want to make sure I'm approaching it the right way. It's not a question yet, but if I see more of them, maybe I can offer more assistance. So responding with an emoji like 😊 would be nice.\n</think>\n\nHello! How are you doing today? 😊",
        "role": "assistant"
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 40,
    "completion_tokens": 40,
    "total_tokens": 80
  }
}

Streaming Response

When stream: true is specified, responses are streamed as Server Sent Events(SSE) objects:

HTTP
curl -X POST http://localhost:8080/v1/chat/completions\?provider\=ollama -d '{
  "model": "deepseek-r1:1.5b",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hi, how are you doing today?"
    }
  ],
  "stream": true,
  "max_tokens": 40
}'

Response:

HTTP
Status: 200 OK
Content-Type: text/event-stream

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481679,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"\u003cthink\u003e"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481680,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"\n"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481680,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"Okay"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481680,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":","},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481680,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" so"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481681,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481681,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" need"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481685,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" to"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481685,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" figure"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481685,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" out"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" what"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" best"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" answer"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" is"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" for"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481686,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" question"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481688,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" \""},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481688,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"How"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" should"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" handle"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" these"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" deep"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" conversations"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481689,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" over"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" phone"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"?\""},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" \n\n"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"Let"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" me"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" start"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481690,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" by"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" understanding"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" what"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":"'s"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" being"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":" asked"},"finish_reason":null}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":"length"}]}

data: {"id":"chatcmpl-509","object":"chat.completion.chunk","created":1742481691,"model":"deepseek-r1:1.5b","system_fingerprint":"fp_ollama","choices":[],"usage":{"prompt_tokens":17,"completion_tokens":40,"total_tokens":57}}

data: [DONE]

Note that the final message contains the usage metrics of the token completion.

Proxy Requests

Pass requests directly through to provider APIs.

HTTP
{METHOD} /proxy/{provider}/{path}

Where:

  • {METHOD} is any HTTP method (GET, POST, PUT, DELETE, PATCH)
  • {provider} is one of the supported providers
  • {path} is the path to proxy to the provider API

Example: OpenAI Chat Completion

HTTP
POST /proxy/openai/v1/chat/completions
Content-Type: application/json

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "Hello! How can I assist you today?"
    }
  ],
  "temperature": 0.7
}

Health Check

Check if the Inference Gateway service is running.

HTTP
GET /health

Response:

HTTP
Status: 200 OK

Error Responses

When an error occurs, the API returns an appropriate HTTP status code with an error message:

HTTP
{
  "error": "Error message description"
}

Common error status codes

  • 400 Bad Request - Invalid request parameters
  • 401 Unauthorized - Missing or invalid authentication
  • 500 Internal Server Error - Server-side error

Advanced Features

Streaming Responses

You can stream responses from supported providers by setting the stream parameter to true:

Terminal
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Write a story about a space explorer."
      }
    ],
    "stream": true
  }'

Tool Use

For providers that support function calling (like OpenAI and Anthropic), you can use the tools parameter:

Terminal
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the weather in Paris?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get the current weather in a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
      }
    ]
  }'

Direct API Proxy

For more advanced use cases, you can proxy requests directly to the provider's API:

Terminal
curl -X POST http://localhost:8080/proxy/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ]
  }'

OpenAPI Specification

For a complete API reference in OpenAPI format, you can view the

OpenAPI specification file