Model Context Protocol (MCP) Integration

The Inference Gateway supports Model Context Protocol (MCP) integration, enabling seamless access to external tools and data sources for Large Language Models (LLMs). This powerful feature automatically discovers and provides tools to LLMs without requiring clients to manage them individually.

What is Model Context Protocol?

The Model Context Protocol is an open standard that enables AI applications to securely access external data sources and tools. It provides a unified way for LLMs to interact with:

File systems - Read, write, and manage files
Databases - Query and manipulate data
APIs - Access external services and data
Search engines - Retrieve information from the web
Development tools - Git operations, code analysis, and more

Key Features

🔌 Automatic Tool Discovery: MCP servers are automatically discovered and their tools are made available to LLMs
🛠️ Multi-Server Support: Connect to multiple MCP servers simultaneously
🔄 Dynamic Tool Injection: Tools are automatically injected into LLM requests based on available MCP servers
🎯 Seamless Execution: Tool calls are executed transparently and results returned to the LLM
🚀 Zero Client Configuration: Clients don't need to know about or manage individual tools
📊 Built-in Monitoring: Full observability through OpenTelemetry integration

How It Works

Request Processing: Client sends a chat completion request
Tool Discovery: Gateway discovers available tools from all connected MCP servers
Tool Injection: Available tools are automatically added to the LLM request
LLM Processing: LLM decides which tools to use based on the request
Tool Execution: Gateway executes tool calls via MCP protocol
Result Integration: Tool results are integrated into the conversation
Response Delivery: Complete response is returned to the client

Configuration

Environment Variables

Enable MCP integration by setting these environment variables:

Terminal

# Enable MCP middleware
MCP_ENABLE=true

# Expose MCP endpoints for debugging
MCP_EXPOSE=true

# Comma-separated list of MCP server URLs
MCP_SERVERS="http://time-server:8081/mcp,http://search-server:8082/mcp,http://filesystem-server:8083/mcp"

# Timeout configurations (optional)
MCP_CLIENT_TIMEOUT=10s
MCP_DIAL_TIMEOUT=5s
MCP_TLS_HANDSHAKE_TIMEOUT=5s
MCP_RESPONSE_HEADER_TIMEOUT=5s
MCP_EXPECT_CONTINUE_TIMEOUT=2s
MCP_REQUEST_TIMEOUT=10s

Using Docker Compose

YAML

version: '3.8'
services:
  inference-gateway:
    image: ghcr.io/inference-gateway/inference-gateway:latest
    environment:
      - MCP_ENABLE=true
      - MCP_EXPOSE=true
      - MCP_SERVERS=http://mcp-time-server:8081/mcp,http://mcp-search-server:8082/mcp
      - GROQ_API_KEY=${GROQ_API_KEY}
    ports:
      - '8080:8080'
    depends_on:
      - mcp-time-server
      - mcp-search-server

  mcp-time-server:
    image: mcp/time-server:latest
    ports:
      - '8081:8081'

  mcp-search-server:
    image: mcp/search-server:latest
    ports:
      - '8082:8082'

Using Kubernetes

When deploying with the Inference Gateway Helm chart, configure MCP in your values.yaml:

YAML

env:
  MCP_ENABLE: 'true'
  MCP_EXPOSE: 'true'
  MCP_SERVERS: 'http://mcp-time-server:8081/mcp,http://mcp-search-server:8082/mcp'
  MCP_CLIENT_TIMEOUT: '10s'
  MCP_REQUEST_TIMEOUT: '10s'

Usage Examples

Basic Usage

Once configured, MCP tools are automatically available to all LLM requests:

Terminal

curl -X POST http://localhost:8080/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "user",
        "content": "What time is it and create a file called hello.txt with greeting message?"
      }
    ]
  }'

With Streaming

MCP works seamlessly with streaming responses:

Terminal

curl -X POST http://localhost:8080/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "user",
        "content": "Search for information about Model Context Protocol and summarize it"
      }
    ],
    "stream": true
  }'

Multiple Tool Usage

LLMs can use multiple tools in a single conversation:

Terminal

curl -X POST http://localhost:8080/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "groq/meta-llama/llama-4-scout-17b-16e-instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant with access to various tools."
      },
      {
        "role": "user",
        "content": "Check the current time, search for recent news about AI, and save a summary to a file named daily-ai-update.txt"
      }
    ]
  }'

Available MCP Endpoints

When MCP_EXPOSE=true, the gateway exposes additional endpoints for debugging:

List Available Tools

Terminal

GET /v1/mcp/tools

Returns all available tools from connected MCP servers:

JSON

{
  "tools": [
    {
      "name": "get_time",
      "description": "Get current time in various formats",
      "server": "http://time-server:8081/mcp",
      "inputSchema": {
        "type": "object",
        "properties": {
          "format": {
            "type": "string",
            "description": "Time format (ISO, human-readable, etc.)"
          }
        }
      }
    },
    {
      "name": "search",
      "description": "Perform web search",
      "server": "http://search-server:8082/mcp",
      "inputSchema": {
        "type": "object",
        "properties": {
          "query": {
            "type": "string",
            "description": "Search query"
          }
        },
        "required": ["query"]
      }
    }
  ]
}

Check MCP Server Health

Terminal

GET /v1/mcp/health

Returns the health status of all connected MCP servers.

Common MCP Server Types

Filesystem Server

Provides file and directory operations:

read_file: Read content from files
write_file: Write content to files (supports overwrite and append)
delete_file: Delete files
list_directory: List directory contents (supports recursive listing)
create_directory: Create directories
file_exists: Check if files or directories exist
file_info: Get detailed file/directory information

Search Server

Provides web search capabilities:

search: Perform web searches for information
find_info: Find specific information on topics

Time Server

Provides time-related utilities:

get_time: Get current time in various formats
get_timezone: Get timezone information
time_difference: Calculate time differences

Database Server

Provides database access:

query: Execute SQL queries
insert: Insert data into tables
update: Update existing records
delete: Delete records

Error Handling

The MCP middleware includes comprehensive error handling:

Connection Failures: Graceful fallback when MCP servers are unavailable
Tool Execution Errors: Detailed error messages returned to LLMs
Timeout Handling: Configurable timeouts prevent hanging requests
Retry Logic: Automatic retries for transient failures

Security Considerations

Production Deployment

⚠️ Important: The example MCP servers provided in the repository are for demonstration only. For production deployments:

Implement Authentication: Use proper authentication mechanisms
Add Authorization: Implement role-based access control
Input Validation: Validate and sanitize all inputs
Rate Limiting: Implement rate limiting to prevent abuse
Audit Logging: Log all tool executions for security monitoring
Network Security: Use TLS for all MCP communications
Sandboxing: Isolate MCP servers and limit their capabilities

Debugging and Monitoring

MCP Inspector

Use the MCP Inspector for debugging:

Terminal

# Access the inspector (when included in deployment)
open http://localhost:6274

The inspector provides:

Server connection status
Available tools exploration
Interactive tool testing
Protocol message monitoring

Logging

Enable debug logging for MCP operations:

Terminal

LOG_LEVEL=debug

This will log:

MCP server connections
Tool discovery events
Tool execution details
Error conditions

Metrics

MCP middleware exposes metrics through OpenTelemetry:

mcp_requests_total: Total MCP requests
mcp_request_duration: Request duration
mcp_tool_calls_total: Total tool calls
mcp_errors_total: Total errors

Examples and Tutorials

Docker Compose Example

See the complete Docker Compose MCP example that includes:

Inference Gateway with MCP enabled
Multiple MCP servers (time, search, filesystem)
MCP Inspector for debugging
Ready-to-run configuration

Kubernetes Example

See the Kubernetes MCP example that demonstrates:

Helm chart deployment with MCP configuration
Multiple MCP servers as Kubernetes services
Ingress configuration
Comprehensive monitoring setup

Custom MCP Server

To create your own MCP server, implement the MCP specification:

PYTHON

# Example Python MCP server structure
from mcp.server import Server
from mcp.tools import Tool

server = Server("my-custom-server")

@server.tool("my_tool")
def my_tool(param1: str, param2: int) -> str:
    """Description of what this tool does."""
    # Tool implementation
    return f"Result: {param1} - {param2}"

if __name__ == "__main__":
    server.run()

# Check if MCP server is running
curl http://mcp-server:8081/mcp/health

# Verify network connectivity
kubectl exec -it inference-gateway-pod -- curl http://mcp-server:8081/mcp

Tools Not Appearing

Verify MCP_ENABLE=true
Check MCP_SERVERS configuration
Ensure MCP servers are accessible
Check logs for connection errors

Tool Execution Timeouts

Increase timeout values:

Terminal

MCP_REQUEST_TIMEOUT=30s
MCP_CLIENT_TIMEOUT=30s

Health Checks

Monitor MCP integration health:

Terminal

# Check gateway health
curl http://localhost:8080/health

# Check MCP-specific health (if MCP_EXPOSE=true)
curl http://localhost:8080/v1/mcp/health

# List available tools
curl http://localhost:8080/v1/mcp/tools

Best Practices

Start Simple: Begin with one or two MCP servers and gradually add more
Monitor Performance: Track tool execution times and success rates
Implement Fallbacks: Design your system to work even if some MCP servers are unavailable
Version Management: Use proper versioning for your MCP servers
Documentation: Document your custom tools and their expected inputs/outputs
Testing: Thoroughly test tool interactions before production deployment

Learn More

Ready to get started? Try our examples or check out the Getting Started guide to set up your first MCP-enabled Inference Gateway.

Model Context Protocol (MCP) Integration

What is Model Context Protocol?

Key Features

How It Works

Configuration

Environment Variables

Using Docker Compose

Using Kubernetes

Usage Examples

Basic Usage

With Streaming

Multiple Tool Usage

Available MCP Endpoints

List Available Tools

Check MCP Server Health

Common MCP Server Types

Filesystem Server

Search Server

Time Server

Database Server

Error Handling

Security Considerations

Production Deployment

Debugging and Monitoring

MCP Inspector

Logging

Metrics

Examples and Tutorials

Docker Compose Example

Kubernetes Example

Custom MCP Server

Troubleshooting

Common Issues

MCP Server Connection Failed

Tools Not Appearing

Tool Execution Timeouts

Health Checks

Best Practices

Learn More