Supported Providers

Inference Gateway provides a unified interface to interact with multiple LLM providers. This page details each supported provider, their configuration, and usage examples.

Available Providers

The following LLM providers are currently supported:

OpenAI

Access GPT models including GPT-3.5, GPT-4, and more.

Authentication: Bearer Token

Default URL: https://api.openai.com/v1

DeepSeek

Use DeepSeek's models for various natural language tasks.

Authentication: Bearer Token

Default URL: https://api.deepseek.com

Anthropic

Connect to Claude models for high-quality conversational AI.

Authentication: X-Header

Default URL: https://api.anthropic.com/v1

Cohere

Use Cohere's models for various natural language tasks.

Authentication: Bearer Token

Default URL: https://api.cohere.com

Groq

Access high-performance inference with Groq's LPU-accelerated models.

Authentication: Bearer Token

Default URL: https://api.groq.com/openai/v1

Cloudflare

Connect to Cloudflare Workers AI for inference on various models.

Authentication: Bearer Token

Default URL: https://api.cloudflare.com/client/v4/accounts/

{ACCOUNT_ID}/ai

Ollama

Run open-source models locally or on a self-hosted server.

Authentication: None (optional API key)

Default URL: http://ollama:8080/v1

Using Providers

Provider Configuration

Each provider requires specific configuration through environment variables:

  • PROVIDER_API_URL: The base URL for the provider's API
  • PROVIDER_API_KEY: The authentication key for the provider

Replace "PROVIDER" with the provider name (uppercase): OPENAI, ANTHROPIC, COHERE, GROQ, CLOUDFLARE, OLLAMA.

API Endpoints

Inference Gateway offers two main approaches to interact with providers:

1. Unified Generate API

The unified API allows you to generate content with a consistent interface across all providers:

HTTP
POST /v1/chat/completions
Content-Type: application/json

{
  "model": "MODEL_NAME",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello, world!"
    }
  ]
}

2. Provider Proxy

You can also proxy requests directly to the provider's native API:

Terminal
POST /proxy/{provider}/{path}
Content-Type: application/json

// Provider-specific request body

Provider-Specific Examples

OpenAI Provider

Generate content with OpenAI models:

Terminal
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello, world!"
      }
    ]
  }'

List all available models:

Terminal
curl http://localhost:8080/v1/models

DeepSeek Provider

Generate content with DeepSeek models:

Terminal
curl -X POST http://localhost:8080/v1/chat/completions?provider=deepseek \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-reasoner",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ]
  }'

List available models:

Terminal
curl http://localhost:8080/v1/models?provider=deepseek

Anthropic Provider

Generate content with Anthropic Claude models:

Terminal
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-opus-20240229",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms."
      }
    ]
  }'

List available models:

Terminal
curl http://localhost:8080/v1/models?provider=anthropic

Cohere Provider

Generate content with Cohere models:

Terminal
curl -X POST http://localhost:8080/v1/chat/completions?provider=cohere \
  -H "Content-Type: application/json" \
  -d '{
    "model": "command",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Write a short poem about AI."
      }
    ]
  }'

Groq Provider

Generate content with Groq's high-performance models:

Terminal
curl -X POST http://localhost:8080/v1/chat/completions?provider=groq \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2-70b-4096",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What are the benefits of quantum computing?"
      }
    ]
  }'

Cloudflare Provider

Generate content with Cloudflare Workers AI:

Terminal
curl -X POST http://localhost:8080/v1/chat/completions?provider=cloudflare \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b-instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Explain how neural networks work."
      }
    ]
  }'

Ollama Provider

Generate content with locally-hosted Ollama models:

Terminal
curl -X POST http://localhost:8080/v1/chat/completions?provider=ollama \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Write a function to calculate Fibonacci numbers in Python."
      }
    ]
  }'