Inference Gateway Documentation

Inference Gateway is a proxy server designed to facilitate access to various language model APIs. It allows users to interact with different language models through a unified interface, simplifying the configuration and the process of sending requests and receiving responses from multiple LLMs, enabling an easy use of Mixture of Experts.

Key Features

📜 Open Source: Available under the MIT License.
🚀 Unified API Access: Proxy requests to multiple language model APIs, including OpenAI, Ollama, Groq, Cohere etc.
⚙️ Environment Configuration: Easily configure API keys and URLs through environment variables.
🔧 Tool-use Support: Enable function calling capabilities across supported providers with a unified API.
🔌 MCP Integration: Seamlessly connect to Model Context Protocol servers for automatic tool discovery and execution.
🤖 Agent-To-Agent (A2A): Coordinate multiple specialized agents to extend LLM capabilities with external services.
🌊 Streaming Responses: Stream tokens in real-time as they're generated from language models.
🐳 Docker Support: Use Docker and Docker Compose for easy setup and deployment.
☸️ Kubernetes Support: Ready for deployment in Kubernetes environments.
📊 OpenTelemetry: Monitor and analyze performance.
🛡️ Production Ready: Built with production in mind, with configurable timeouts and TLS support.
🌿 Lightweight: Includes only essential libraries and runtime, resulting in smaller size binary of ~10.8MB.
📉 Minimal Resource Consumption: Designed to consume minimal resources and have a lower footprint.
📚 Documentation: Well documented with examples and guides.
🧪 Tested: Extensively tested with unit tests and integration tests.
🛠️ Maintained: Actively maintained and developed.
📈 Scalable: Easily scalable and can be used in a distributed environment - with HPA in Kubernetes.
🔒 Compliance and Data Privacy: This project does not collect data or analytics, ensuring compliance and data privacy.
🏠 Self-Hosted: Can be self-hosted for complete control over the deployment environment.

Getting Started

Ready to try Inference Gateway? Follow our Getting Started guide to install and set up your own instance in minutes.

How It Works

Inference Gateway acts as an intermediary between your applications and various LLM providers. By standardizing the API interactions, it allows you to:

Access multiple LLM providers through a single integration
Switch between providers without changing application code
Implement sophisticated routing and fallback mechanisms
Centralize API key management and security policies

Model Context Protocol Integration

Inference Gateway includes native support for the Model Context Protocol (MCP), enabling LLMs to automatically access external tools and data sources. With MCP integration, you can:

Automatically discover tools from connected MCP servers
Execute tool calls seamlessly without client-side management
Connect multiple data sources like filesystems, databases, and APIs
Extend LLM capabilities with custom tools and integrations

Terminal

# Enable MCP with multiple servers
export MCP_ENABLE=true
export MCP_SERVERS="http://filesystem-server:8081/mcp,http://search-server:8082/mcp"

# LLMs automatically get access to all available tools
curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "List files and search for recent AI news"}]}'

Learn more about MCP Integration and explore our comprehensive examples.

Agent-To-Agent (A2A) Integration

Inference Gateway supports Agent-To-Agent (A2A) integration, enabling LLMs to coordinate with multiple specialized agents simultaneously. This powerful feature allows LLMs to:

Coordinate multiple agents in a single conversation
Access specialized services like calendars, calculators, and weather APIs
Discover agent capabilities automatically
Scale agent ecosystems with distributed architecture

Terminal

# Enable A2A with multiple agents
export A2A_ENABLE=true
export A2A_AGENTS="http://calendar-agent:3001,http://calculator-agent:3002,http://weather-agent:3003"

# LLMs can coordinate all agents in one request
curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "deepseek-chat", "messages": [{"role": "user", "content": "Schedule a meeting, calculate costs, and check the weather"}]}'

Learn more about A2A Integration and see how to build your own agents.

Community

Inference Gateway is an open-source project maintained by a growing community. Contributions are welcome on GitHub.