Inference GatewayOne API for every LLM

Open-source, cloud-native proxy unifying OpenAI, Anthropic, Groq, Cohere, Ollama, Ollama Cloud, DeepSeek, Cloudflare, Google, Mistral, MiniMax, Moonshot, Nvidia and llama.cpp behind a single OpenAI-compatible API.

Get Started

View on GitHub

Architecture

🚀

Unified API access

Talk to OpenAI, Anthropic, Groq, Cohere, Ollama, Ollama Cloud, DeepSeek, Cloudflare, Google, Mistral, MiniMax, Moonshot, Nvidia and llama.cpp through one OpenAI-compatible endpoint.

Supported providers

🔌

MCP integration

Native Model Context Protocol support. Auto-discover tools from connected MCP servers and execute tool calls server-side.

MCP guide

🤖

Agent-to-Agent (A2A)

Coordinate specialized agents from inside any chat completion. Discover capabilities, delegate tasks, stream results.

A2A guide

📝

Define agents as code

Describe an A2A agent once in an Agent Definition Language (ADL) YAML file, then generate an enterprise-ready Go or Rust server with the ADL CLI.

Agent Definition Language

🌊

Streaming first-class

Server-Sent Events streaming with token-level deltas, tool-call chunks, and final usage metrics.

API reference

☸️

Kubernetes-native

Kubernetes Operator with CRDs for gateways, agents, MCP servers, and chat orchestrators - declarative, GitOps-friendly cluster management.

Operator guide

📊

OpenTelemetry built-in

Prometheus metrics, OTLP tracing, structured JSON logs, reference Grafana dashboards. Production observability out of the box.

Observability

🛡️

Enterprise-ready auth

OIDC authentication with Keycloak and any standards-compliant identity provider. JWT validation against the issuer's JWKS.

Authentication

🌿

Lightweight

~10.8 MB static binary. Minimal CPU and memory footprint. Designed to scale horizontally with HPA in Kubernetes.

🔒

Privacy-first

No analytics, no telemetry phoning home. Self-host anywhere - on-prem, cloud, or air-gapped.

Why Inference Gateway?

Building against multiple LLM providers means juggling SDKs, API quirks, auth schemes, and streaming protocols that drift constantly. Inference Gateway sits in front of every provider and exposes a single, stable, OpenAI-compatible surface so your application code never has to care which model is on the other end.

Switch providers with one config change, no application redeploys.
Centralise API keys, rate limiting, and audit logging at the gateway.
Add MCP tools or A2A agents once, get them for every model that supports tool calls.
Run the same binary in Docker or Kubernetes - and let the Kubernetes Operator manage gateways, agents, MCP servers, and orchestrators as Custom Resources.

How it works

Inference Gateway acts as an intermediary between your applications and various LLM providers. By standardising the API interactions, it lets you:

Access multiple LLM providers through a single integration.
Switch between providers without changing application code.
Implement sophisticated routing and fallback mechanisms.
Centralise API key management and security policies.

Model Context Protocol (MCP)

Native support for the Model Context Protocol lets LLMs automatically access external tools and data sources. With MCP integration, you can:

Automatically discover tools from connected MCP servers.
Execute tool calls seamlessly without client-side management.
Connect multiple data sources like filesystems, databases, and APIs.
Extend LLM capabilities with custom tools and integrations.

bash

# Enable MCP with multiple servers
export MCP_ENABLED=true
export MCP_SERVERS="http://filesystem-server:8081/mcp,http://search-server:8082/mcp"

# LLMs automatically get access to all available tools
curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "deepseek/deepseek-v4-flash", "messages": [{"role": "user", "content": "List files and search for recent AI news"}]}'

Learn more about MCP Integration and explore the examples.

Agent-to-Agent (A2A)

Agent-to-Agent support lets LLMs coordinate with multiple specialised agents in a single conversation. Agents can:

Coordinate multiple agents in a single conversation.
Access specialised services like calendars, calculators, and weather APIs.
Discover agent capabilities automatically.
Scale agent ecosystems with distributed architecture.

The best way to use A2A is through the Inference Gateway CLI, which provides seamless integration with A2A agents:

bash

# Install the CLI
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash

# Initialize and start chatting
infer init
infer chat

# Delegate tasks to A2A agents
> "Schedule a team meeting for tomorrow at 2 PM"
> "Check my calendar for conflicts this week"

Learn more about A2A Integration and see how to build your own agents.

Agent Definition Language (ADL)

Prefer to define an agent as code? The Agent Definition Language (ADL) describes an entire A2A agent - provider, model, tools, skills, server, and deployment - in a single declarative agent.yaml file. The ADL CLI turns that manifest into an enterprise-ready Go or Rust project, so the agent stays version-controlled and reproducible.

bash

# Scaffold, validate, and generate an A2A agent from a declarative manifest
adl init my-weather-agent
adl validate agent.yaml
adl generate --file agent.yaml --output ./my-weather-agent

Read the Agent Definition Language overview to see how ADL, the ADL CLI, and the ADK fit together, or jump straight to the canonical spec at adl.inference-gateway.com.

Community

Inference Gateway is an open-source project maintained by a growing community. Contributions are welcome on GitHub.

Inference GatewayOne API for every LLM

Unified API access

MCP integration

Agent-to-Agent (A2A)

Define agents as code

Streaming first-class

Kubernetes-native

OpenTelemetry built-in

Enterprise-ready auth

Lightweight

Privacy-first

Why Inference Gateway? ​

How it works ​

Model Context Protocol (MCP) ​

Agent-to-Agent (A2A) ​

Agent Definition Language (ADL) ​

Community ​

Why Inference Gateway?

How it works

Model Context Protocol (MCP)

Agent-to-Agent (A2A)

Agent Definition Language (ADL)

Community