Observability
Inference Gateway provides robust observability features to help monitor and troubleshoot your deployment. These features include metrics collection, tracing, and logging capabilities that integrate with popular monitoring tools.
OpenTelemetry Integration
Inference Gateway includes built-in support for OpenTelemetry, allowing you to collect metrics, traces, and logs that provide insights into the system's performance and behavior.
To enable telemetry in your deployment, set the following environment variable:
ENABLE_TELEMETRY=true
Metrics Collection
When telemetry is enabled, Inference Gateway exposes a metrics endpoint at /metrics that can be scraped by Prometheus. Key metrics include:
- Request count by provider and model
- Request duration by provider and model
- Error rates
- Resource utilization (CPU, memory)
Kubernetes Monitoring Setup
For Kubernetes deployments, you can implement comprehensive monitoring using Prometheus and Grafana:
Components
- Inference Gateway: The main application exposing metrics
- Prometheus: Time-series database for storing metrics
- Grafana: Visualization platform for metrics dashboards
Implementation Example
- Enable telemetry in your ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: inference-gateway
namespace: inference-gateway
data:
# General settings
APPLICATION_NAME: "inference-gateway"
ENVIRONMENT: "production"
ENABLE_TELEMETRY: "true" # Enable telemetry
...
- Create a ServiceMonitor resource for Prometheus to discover and scrape metrics:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: inference-gateway
namespace: monitoring
spec:
selector:
matchLabels:
app: inference-gateway
endpoints:
- port: http
path: /metrics
interval: 15s
namespaceSelector:
matchNames:
- inference-gateway
- Install Grafana dashboards to visualize the collected metrics.
Pre-built Dashboards
Inference Gateway provides pre-built Grafana dashboards that include:
- Overview of request volume and latency
- Provider-specific metrics
- Error rates and types
- Resource utilization
These dashboards can be imported directly into your Grafana installation.
Logging
Inference Gateway outputs structured logs in JSON format that can be collected and analyzed by tools like Elasticsearch, Loki, or other log management systems.
For detailed information on setting up a complete monitoring stack with Inference Gateway, check out the monitoring examples in the GitHub repository.