Observability

Inference Gateway provides robust observability features to help monitor and troubleshoot your deployment. These features include metrics collection, tracing, and logging capabilities that integrate with popular monitoring tools.

OpenTelemetry Integration

Inference Gateway includes built-in support for OpenTelemetry, allowing you to collect metrics, traces, and logs that provide insights into the system's performance and behavior.

To enable telemetry in your deployment, set the following environment variable:

ENABLE_TELEMETRY=true

Metrics Collection

When telemetry is enabled, Inference Gateway exposes a metrics endpoint at /metrics that can be scraped by Prometheus. Key metrics include:

Request count by provider and model
Request duration by provider and model
Error rates
Resource utilization (CPU, memory)

Kubernetes Monitoring Setup

For Kubernetes deployments, you can implement comprehensive monitoring using Prometheus and Grafana:

Components

Inference Gateway: The main application exposing metrics
Prometheus: Time-series database for storing metrics
Grafana: Visualization platform for metrics dashboards

Implementation Example

Enable telemetry in your ConfigMap:

YAML

apiVersion: v1
kind: ConfigMap
metadata:
  name: inference-gateway
  namespace: inference-gateway
data:
  # General settings
  ENVIRONMENT: "production"
  ENABLE_TELEMETRY: "true"  # Enable telemetry
  ...

Create a ServiceMonitor resource for Prometheus to discover and scrape metrics:

YAML

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: inference-gateway
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: inference-gateway
  endpoints:
    - port: http
      path: /metrics
      interval: 15s
  namespaceSelector:
    matchNames:
      - inference-gateway

Install Grafana dashboards to visualize the collected metrics.

Pre-built Dashboards

Inference Gateway provides pre-built Grafana dashboards that include:

Overview of request volume and latency
Provider-specific metrics
Error rates and types
Resource utilization

These dashboards can be imported directly into your Grafana installation.

Logging

Inference Gateway outputs structured logs in JSON format that can be collected and analyzed by tools like Elasticsearch, Loki, or other log management systems.

For detailed information on setting up a complete monitoring stack with Inference Gateway, check out the monitoring examples in the GitHub repository.