Inference Gateway Documentation

Inference Gateway is a proxy server designed to facilitate access to various language model APIs. It allows users to interact with different language models through a unified interface, simplifying the configuration and the process of sending requests and receiving responses from multiple LLMs, enabling an easy use of Mixture of Experts.

Key Features

📜 Open Source: Available under the MIT License.
🚀 Unified API Access: Proxy requests to multiple language model APIs, including OpenAI, Ollama, Groq, Cohere etc.
⚙️ Environment Configuration: Easily configure API keys and URLs through environment variables.
🔧 Tool-use Support: Enable function calling capabilities across supported providers with a unified API.
🌊 Streaming Responses: Stream tokens in real-time as they're generated from language models.
🐳 Docker Support: Use Docker and Docker Compose for easy setup and deployment.
☸️ Kubernetes Support: Ready for deployment in Kubernetes environments.
📊 OpenTelemetry: Monitor and analyze performance.
🛡️ Production Ready: Built with production in mind, with configurable timeouts and TLS support.
🌿 Lightweight: Includes only essential libraries and runtime, resulting in smaller size binary of ~10.8MB.
📉 Minimal Resource Consumption: Designed to consume minimal resources and have a lower footprint.
📚 Documentation: Well documented with examples and guides.
🧪 Tested: Extensively tested with unit tests and integration tests.
🛠️ Maintained: Actively maintained and developed.
📈 Scalable: Easily scalable and can be used in a distributed environment - with HPA in Kubernetes.
🔒 Compliance and Data Privacy: This project does not collect data or analytics, ensuring compliance and data privacy.
🏠 Self-Hosted: Can be self-hosted for complete control over the deployment environment.

Getting Started

Ready to try Inference Gateway? Follow our Getting Started guide to install and set up your own instance in minutes.

How It Works

Inference Gateway acts as an intermediary between your applications and various LLM providers. By standardizing the API interactions, it allows you to:

Access multiple LLM providers through a single integration
Switch between providers without changing application code
Implement sophisticated routing and fallback mechanisms
Centralize API key management and security policies

Community

Inference Gateway is an open-source project maintained by a growing community. Contributions are welcome on GitHub.