Inference Gateway CLI
The Inference Gateway CLI (infer) is a powerful Go-based command-line tool that provides comprehensive access to the Inference Gateway. It features an interactive chat interface with a rich TUI, autonomous agent capabilities, extensive tool integration, and advanced conversation management.
Current Version: v0.58.0 (Breaking changes expected until stable)
What Makes It Special
- Zero-Configuration Setup: Just add your API keys to a
.envfile and start chatting - the gateway manages itself - Autonomous Agent Mode: Delegate complex tasks to an AI agent that works iteratively until completion
- Rich Tool Integration: LLMs can execute commands, read/write files, search code, browse the web, and interact with GitHub
- Smart Safety System: Configurable approval workflow with real-time diff visualization for file changes
- Flexible Modes: Toggle between Standard, Plan (read-only), and Auto-Accept (YOLO) modes during chat
- Beautiful TUI: Scrollable interface with syntax highlighting, tool result expansion, and multiple themes
Installation
Install Script (Recommended)
# Latest version
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash
# Specific version
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash -s -- --version v0.47.0
# Custom directory
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash -s -- --install-dir $HOME/.local/bin
Go Install
If you have Go installed:
go install github.com/inference-gateway/cli@latest
Manual Download
Download binaries from the GitHub releases page. Binaries are signed with Cosign for verification.
Build from Source
git clone https://github.com/inference-gateway/cli.git
cd cli
go build -o infer
Quick Start

Initialize your project and start using the CLI:
# Initialize configuration
infer init
# Check gateway status
infer status
# Start interactive chat
infer chat
# Get help
infer --help
Core Commands
Essential Commands
infer init
Initialize your project with .infer directory and configuration:
infer init
Creates .infer/config.yaml with default settings and tool configurations.
infer status
Check gateway health and resource usage:
infer status
infer chat
Launch interactive chat with a rich terminal user interface (TUI):
infer chat
Key Features:
- Real-time streaming responses with syntax highlighting
- Scrollable chat history with mouse wheel and keyboard support
- Tool result expansion/collapse for detailed inspection
- Model switching during conversation
- Three operational modes: Standard, Plan, and Auto-Accept
Navigation & Shortcuts:
- Shift + Arrow Down/Up: Scroll through chat history
- Ctrl+R: Toggle expanded view of tool results
- Shift+Tab: Cycle through agent modes (Standard → Plan → Auto-Accept)
infer agent
Autonomous agent mode - Execute complex tasks in the background:
infer agent "Analyze this codebase and suggest improvements"
infer agent "Fix the failing tests in the test suite"
infer agent "Implement a new feature based on issue #123"
The agent operates autonomously with task analysis, planning, execution, and validation phases.
Agent Modes
The CLI supports three operational modes that change how the agent behaves and what tools it can access. Toggle between modes anytime during a chat session using Shift+Tab.
Standard Mode (Default) 🎯
Normal operation with all configured tools and approval checks enabled.
What you get:
- Full access to all tools defined in your configuration
- Approval prompts for sensitive operations (Write, Edit, Delete, Bash)
- Real-time diff visualization for file modifications
- Balanced safety and functionality
Best for: General development work, exploring codebases, collaborative coding
Example:
infer chat
> "Refactor the authentication module to use environment variables"
# Agent will analyze code, propose changes, and ask for approval before modifying files
Plan Mode (Read-Only) 📋
Designed for planning and analysis without executing changes.
What you get:
- Limited to Read, Grep, and Tree tools only
- Cannot modify files or execute commands
- Provides detailed step-by-step implementation plans
- Safe exploration of unfamiliar codebases
Best for: Code reviews, architecture analysis, understanding before implementing
Example:
infer chat
# Press Shift+Tab to switch to Plan Mode (shows: 📋 Plan Mode indicator)
> "How should I implement user authentication with JWT tokens?"
# Agent explores code structure and provides detailed plan without making changes
Auto-Accept Mode (YOLO) ⚡
All tool executions are automatically approved - maximum speed, minimum friction.
What you get:
- Full access to all configured tools
- Zero approval prompts - immediate execution
- All safety guardrails bypassed
- Rapid iteration and automation
Best for: Trusted environments, rapid prototyping, repetitive tasks, time-sensitive work
⚠️ Important: Use with caution. Ensure you have:
- Version control (git) with clean working tree
- Backups of important files
- Clear understanding of the task
Example:
infer chat
# Press Shift+Tab twice to switch to Auto-Accept Mode (shows: ⚡ Auto-Accept indicator)
> "Run the test suite, fix all failing tests, and commit the changes"
# Agent executes everything immediately without interruption
Switching Modes
Press Shift+Tab during any chat session to cycle through modes:
Standard Mode → Plan Mode → Auto-Accept Mode → Standard Mode (loops)
The current mode is shown below the input field when not in Standard mode.
Configuration Management
Initialize Configuration Only
infer config init
Agent Configuration
# Set default model for chat
infer config agent set-model openai/gpt-4
# Set system prompt
infer config agent set-system "You are a helpful coding assistant"
Tool Management
# Enable/disable tool execution
infer config tools enable
infer config tools disable
# Manage command whitelist
infer config tools list
infer config tools validate
infer config tools exec
# Safety settings
infer config tools safety enable # Require approval prompts
infer config tools safety disable
infer config tools safety status
# Sandbox management
infer config tools sandbox add /protected/path
infer config tools sandbox remove /protected/path
infer config tools sandbox list
Configuration
The CLI uses a 2-layer configuration system with precedence:
- Environment Variables (
INFER_*prefix) - Highest priority - Command Line Flags
- Project Config (
.infer/config.yaml) - User Config (
~/.infer/config.yaml) - Built-in Defaults - Lowest priority
Configuration File Structure
The CLI creates a comprehensive YAML configuration:
gateway:
url: http://localhost:8080
api_key: ''
timeout: 200
oci: ghcr.io/inference-gateway/inference-gateway:latest # OCI image for auto-running gateway
run: true # Auto-run gateway if not available
docker: true # Use Docker to run gateway
include_models: [] # Only show these models (empty = all)
exclude_models: # Models to hide from selection
- ollama_cloud/cogito-2.1:671b
- ollama_cloud/kimi-k2:1t
- ollama_cloud/kimi-k2-thinking
- ollama_cloud/deepseek-v3.1:671b
client:
timeout: 200
retry:
enabled: true
max_attempts: 3
initial_backoff_sec: 5
max_backoff_sec: 60
backoff_multiplier: 2
retryable_status_codes: [400, 408, 429, 500, 502, 503, 504]
logging:
debug: false
dir: '' # Directory for log files (optional)
tools:
enabled: true # Tools are enabled by default with safe read-only commands
sandbox:
directories: ['.', '/tmp'] # Allowed directories for tool operations
protected_paths: # Paths excluded from tool access for security
- .infer/
- .git/
- '*.env'
bash:
enabled: true
whitelist:
commands: # Exact command matches
- ls
- pwd
- tree
- wc
- sort
- uniq
- head
- tail
- task
- make
- find
patterns: # Regex patterns for more complex commands
- ^git status$
- ^git branch( --show-current)?( -[alrvd])?$
- ^git log
- ^git diff
- ^git remote( -v)?$
- ^git show
read:
enabled: true
require_approval: false
write:
enabled: true
require_approval: true # Write operations require approval by default for security
edit:
enabled: true
require_approval: true # Edit operations require approval by default for security
delete:
enabled: true
require_approval: true # Delete operations require approval by default for security
grep:
enabled: true
backend: auto # "auto", "ripgrep", or "go"
require_approval: false
tree:
enabled: true
require_approval: false
web_fetch:
enabled: true
whitelisted_domains:
- golang.org
safety:
max_size: 8192 # 8KB
timeout: 30 # 30 seconds
allow_redirect: true
cache:
enabled: true
ttl: 3600 # 1 hour
max_size: 52428800 # 50MB
web_search:
enabled: true
default_engine: duckduckgo
max_results: 10
engines:
- duckduckgo
- google
timeout: 10
todo_write:
enabled: true
require_approval: false
github:
enabled: true
token: '%GITHUB_TOKEN%'
base_url: 'https://api.github.com'
owner: ''
repo: '' # Default repository (optional)
safety:
max_size: 1048576 # 1MB
timeout: 30 # 30 seconds
safety:
require_approval: true
compact:
output_dir: .infer # Directory for compact command exports
summary_model: '' # Model to use for summarization (optional)
agent:
model: '' # Default model for agent operations
system_prompt: | # System prompt for the main agent
Autonomous software engineering agent...
system_prompt_plan: | # System prompt for plan mode
You are an AI planning assistant in PLAN MODE...
system_reminders:
enabled: true
interval: 4
reminder_text: |
System reminder text for maintaining context
verbose_tools: false
max_turns: 50 # Maximum number of turns for agent sessions
max_tokens: 4096 # The maximum number of tokens per request
max_concurrent_tools: 5 # Maximum parallel tool executions
optimization:
enabled: false
model: '' # Model for optimization (optional)
min_messages: 10
buffer_size: 2
git:
commit_message:
model: '' # Model for AI commit messages (optional)
system_prompt: |
Generate a concise git commit message...
storage:
enabled: true
type: sqlite # Options: memory, sqlite, postgres, redis
sqlite:
path: .infer/conversations.db
postgres:
host: localhost
port: 5432
database: infer_conversations
username: ''
password: ''
ssl_mode: prefer
redis:
host: localhost
port: 6379
password: ''
db: 0
conversation:
title_generation:
enabled: true
model: '' # Model for title generation (optional)
system_prompt: |
Generate a concise conversation title...
batch_size: 10
chat:
theme: tokyo-night
a2a:
enabled: true
agents: [] # List of A2A agent URLs
cache:
enabled: true
ttl: 300 # 5 minutes
task:
status_poll_seconds: 5
polling_strategy: exponential
initial_poll_interval_sec: 2
max_poll_interval_sec: 60
backoff_multiplier: 2.0
background_monitoring: true
completed_task_retention: 5
tools:
query_agent:
enabled: true
require_approval: false
query_task:
enabled: true
require_approval: false
submit_task:
enabled: true
require_approval: false
download_artifacts:
enabled: true
download_dir: /tmp/downloads
timeout_seconds: 30
require_approval: false
Environment Variables
export INFER_GATEWAY_URL="http://localhost:3000"
export INFER_GATEWAY_API_KEY="your-api-key"
export INFER_AGENT_MODEL="openai/gpt-4"
export INFER_LOGGING_DEBUG="true"
Advanced Features
Tool System for LLMs
When enabled, LLMs have access to a comprehensive tool suite:
File System Tools
- Bash: Execute whitelisted shell commands
- Read/Write/Edit: File operations with safety controls
- MultiEdit: Batch file edits
- Delete/Tree: File management and exploration
Search Tools
- Grep: Powered by ripgrep for fast code search
- WebSearch/WebFetch: Internet research capabilities
Development Tools
- GitHub API: Repository integration
- TodoWrite: Task management for complex workflows
Security Features
- Command Whitelisting: Only approved patterns allowed
- Approval Prompts: Safety confirmations for dangerous operations
- Path Protection: Sensitive directories automatically excluded
- Sandbox Controls: Protected directory management
Conversation Management
Storage Backends
- SQLite (default): Local file-based storage
- PostgreSQL: Shared database for teams
- Redis: High-performance caching
Features
- Automatic conversation history with search
- Intelligent title generation
- Token optimization and compaction
- Export/import capabilities
Interactive Interface
The TUI provides:
- Scrollable conversation view
- Keyboard shortcuts for navigation
- Tool result expansion/collapse
- Real-time streaming responses
- Model switching during conversation
Git Shortcuts
/git <command> [args...]- Execute git commands (supports commit, push, status, etc.)/git commit [flags]- NEW: Commit staged changes with AI-generated message/git push [remote] [branch] [flags]- NEW: Push commits to remote repository
The git shortcuts provide intelligent commit message generation using AI when no message is provided with /git commit.
User-Defined Shortcuts
You can create custom shortcuts by adding YAML configuration files in the .infer/shortcuts/ directory.
Configuration File Format
Create files named custom-*.yaml (e.g., custom-1.yaml, custom-dev.yaml) in .infer/shortcuts/:
shortcuts:
- name: 'tests'
description: 'Run all tests in the project'
command: 'go'
args: ['test', './...']
working_dir: '.' # Optional: set working directory
- name: 'build'
description: 'Build the project'
command: 'go'
args: ['build', '-o', 'infer', '.']
- name: 'lint'
description: 'Run linter on the codebase'
command: 'golangci-lint'
args: ['run']
Real-World Workflows
Workflow 1: Bug Investigation and Fix
# Start in Plan Mode to understand the issue first
infer chat
# Shift+Tab to switch to Plan Mode
> "Analyze the bug reported in issue #123 and create a fix plan"
# Agent reads code, identifies root cause, provides detailed plan
# Switch to Standard Mode to implement
# Shift+Tab to return to Standard Mode
> "Implement the fix according to the plan"
# Agent makes changes, you approve each modification
# Test and commit
> "Run the test suite to verify the fix"
> "/git commit" # AI generates commit message
Workflow 2: Feature Development from Scratch
# Initialize project understanding
infer chat
> "Read the CONTRIBUTING.md and understand the project structure"
> "Find similar features to understand the patterns"
# Create implementation plan
# Shift+Tab to Plan Mode
> "Design the implementation for user profile feature with avatar upload"
# Switch to Auto-Accept for rapid development
# Shift+Tab twice to Auto-Accept Mode
> "Implement the user profile feature according to the plan"
# Agent creates files, writes code, no interruptions
# Review and test
# Shift+Tab back to Standard Mode
> "Review the changes and run all tests"
Workflow 3: Code Review and Refactoring
infer chat
# Use Plan Mode for analysis
> "Review the authentication module for security issues and code quality"
# Agent provides detailed analysis
# Implement suggested improvements
> "Refactor based on the recommendations, prioritize security issues"
# Agent makes changes with approval
Workflow 4: Working with GitHub Issues
# Let the agent read the issue
infer agent "Fix the bug described in GitHub issue #456"
# Agent will:
# 1. Fetch issue details using GitHub tool
# 2. Analyze relevant code
# 3. Implement fix
# 4. Run tests
# 5. Create commit with reference to issue
Workflow 5: Documentation Generation
infer chat
> "Generate comprehensive API documentation for all exported functions in the /api directory"
# Agent reads code and creates markdown documentation
> "Create a README.md with installation instructions and examples"
# Agent analyzes project structure and creates README
Workflow 6: Automated Testing
infer agent "Create unit tests for all functions in the user service with >80% coverage"
# Agent autonomously:
# - Analyzes the user service code
# - Identifies untested functions
# - Writes comprehensive test cases
# - Runs tests to verify coverage
Tips and Best Practices
For Beginners
- Start with Plan Mode: When working with unfamiliar code, use Plan Mode first to understand before making changes
- Use Git: Always work in a git repository so you can easily revert changes
- Approve Carefully: Read the diff visualization before approving file modifications
- Start Small: Begin with simple tasks like "read this file" or "explain this function"
For Power Users
- Auto-Accept for Trusted Tasks: Use Auto-Accept mode for repetitive, well-understood tasks
- Custom Shortcuts: Create shortcuts for frequent commands (tests, builds, deployments)
- Combine with Scripts: Let the agent generate scripts, then use custom shortcuts to run them
- A2A Integration: Delegate specialized tasks to A2A agents (testing, documentation, security scans)
Performance Tips
- Be Specific: Instead of "fix the code," say "fix the null pointer error in handleRequest function"
- Provide Context: Reference file paths, function names, or line numbers when relevant
- Use Grep First: For large codebases, use Grep to narrow down relevant files before asking for analysis
- Chunk Large Tasks: Break down complex features into smaller, manageable subtasks
Safety Best Practices
- Review Diffs: Always review file modification diffs before approving in Standard Mode
- Test Before Commit: Run tests after significant changes
- Backup Important Work: Have backups before using Auto-Accept mode extensively
- Whitelist Commands: Only whitelist commands you understand and trust
- Protected Paths: Add sensitive directories to protected paths in configuration
Security & Safety
Command Whitelisting
# Add allowed commands
infer config tools whitelist add "npm install"
infer config tools whitelist add "git log --oneline"
# Remove from whitelist
infer config tools whitelist remove "dangerous-command"
Protected Paths
Sensitive directories are automatically protected:
.git/- Git repository data*.env- Environment filesnode_modules/- Dependencies- Custom paths via sandbox configuration
Approval Prompts
Enable safety confirmations:
infer config tools safety enable
LLMs will request approval before executing potentially dangerous operations.
Integration Examples
Development Workflow
# Initialize new project
infer init
# Interactive development
infer chat
> "Read the authentication module and explain how it works"
> "Refactor the database connection to use connection pooling"
# Autonomous agent for complex tasks
infer agent "Fix all linting errors in the codebase"
infer agent "Implement user authentication with JWT"
infer agent "Review the changes in this PR and suggest improvements"
CI/CD Integration
To be implemented
Troubleshooting
Connection Issues
# Check configuration
infer config show
# Verify gateway status
infer status
# Debug mode
infer --debug chat
Permission Issues
# Check configuration directory
ls -la ~/.infer/
# Reset configuration
infer config reset
# Re-initialize
infer init
Tool Execution Problems
# Check tool status
infer config tools status
# Validate whitelist
infer config tools validate
# Enable debug logging
export INFER_LOGGING_DEBUG=true
infer agent "your task"
Command Reference
| Command | Description |
|---|---|
infer init | Initialize project configuration |
infer status | Check gateway health |
infer chat | Interactive chat session |
infer agent <task> | Autonomous task execution |
infer config <subcommand> | Configuration management |
infer --version | Show version information |
infer --help | Display help information |
Support and Resources
- Repository: github.com/inference-gateway/cli
- Issues: GitHub Issues
- Releases: GitHub Releases
The CLI is actively developed with regular updates and new features. Check the repository for the latest releases and announcements.