Inference Gateway CLI

The Inference Gateway CLI (infer) is a powerful Go-based command-line tool that provides comprehensive access to the Inference Gateway. It features an interactive chat interface with a rich TUI, autonomous agent capabilities, extensive tool integration, and advanced conversation management.

Current Version: v0.58.0 (Breaking changes expected until stable)

What Makes It Special

  • Zero-Configuration Setup: Just add your API keys to a .env file and start chatting - the gateway manages itself
  • Autonomous Agent Mode: Delegate complex tasks to an AI agent that works iteratively until completion
  • Rich Tool Integration: LLMs can execute commands, read/write files, search code, browse the web, and interact with GitHub
  • Smart Safety System: Configurable approval workflow with real-time diff visualization for file changes
  • Flexible Modes: Toggle between Standard, Plan (read-only), and Auto-Accept (YOLO) modes during chat
  • Beautiful TUI: Scrollable interface with syntax highlighting, tool result expansion, and multiple themes

Installation

Terminal
# Latest version
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash
Terminal
# Specific version
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash -s -- --version v0.47.0
Terminal
# Custom directory
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash -s -- --install-dir $HOME/.local/bin

Go Install

If you have Go installed:

Terminal
go install github.com/inference-gateway/cli@latest

Manual Download

Download binaries from the GitHub releases page. Binaries are signed with Cosign for verification.

Build from Source

Terminal
git clone https://github.com/inference-gateway/cli.git
cd cli
go build -o infer

Quick Start

Inference Gateway TUI Interface

Initialize your project and start using the CLI:

Terminal
# Initialize configuration
infer init

# Check gateway status
infer status

# Start interactive chat
infer chat

# Get help
infer --help

Core Commands

Essential Commands

infer init

Initialize your project with .infer directory and configuration:

Terminal
infer init

Creates .infer/config.yaml with default settings and tool configurations.

infer status

Check gateway health and resource usage:

Terminal
infer status

infer chat

Launch interactive chat with a rich terminal user interface (TUI):

Terminal
infer chat

Key Features:

  • Real-time streaming responses with syntax highlighting
  • Scrollable chat history with mouse wheel and keyboard support
  • Tool result expansion/collapse for detailed inspection
  • Model switching during conversation
  • Three operational modes: Standard, Plan, and Auto-Accept

Navigation & Shortcuts:

  • Shift + Arrow Down/Up: Scroll through chat history
  • Ctrl+R: Toggle expanded view of tool results
  • Shift+Tab: Cycle through agent modes (Standard → Plan → Auto-Accept)

infer agent

Autonomous agent mode - Execute complex tasks in the background:

Terminal
infer agent "Analyze this codebase and suggest improvements"
infer agent "Fix the failing tests in the test suite"
infer agent "Implement a new feature based on issue #123"

The agent operates autonomously with task analysis, planning, execution, and validation phases.

Agent Modes

The CLI supports three operational modes that change how the agent behaves and what tools it can access. Toggle between modes anytime during a chat session using Shift+Tab.

Standard Mode (Default) 🎯

Normal operation with all configured tools and approval checks enabled.

What you get:

  • Full access to all tools defined in your configuration
  • Approval prompts for sensitive operations (Write, Edit, Delete, Bash)
  • Real-time diff visualization for file modifications
  • Balanced safety and functionality

Best for: General development work, exploring codebases, collaborative coding

Example:

Terminal
infer chat
> "Refactor the authentication module to use environment variables"
# Agent will analyze code, propose changes, and ask for approval before modifying files

Plan Mode (Read-Only) 📋

Designed for planning and analysis without executing changes.

What you get:

  • Limited to Read, Grep, and Tree tools only
  • Cannot modify files or execute commands
  • Provides detailed step-by-step implementation plans
  • Safe exploration of unfamiliar codebases

Best for: Code reviews, architecture analysis, understanding before implementing

Example:

Terminal
infer chat
# Press Shift+Tab to switch to Plan Mode (shows: 📋 Plan Mode indicator)
> "How should I implement user authentication with JWT tokens?"
# Agent explores code structure and provides detailed plan without making changes

Auto-Accept Mode (YOLO) ⚡

All tool executions are automatically approved - maximum speed, minimum friction.

What you get:

  • Full access to all configured tools
  • Zero approval prompts - immediate execution
  • All safety guardrails bypassed
  • Rapid iteration and automation

Best for: Trusted environments, rapid prototyping, repetitive tasks, time-sensitive work

⚠️ Important: Use with caution. Ensure you have:

  • Version control (git) with clean working tree
  • Backups of important files
  • Clear understanding of the task

Example:

Terminal
infer chat
# Press Shift+Tab twice to switch to Auto-Accept Mode (shows: ⚡ Auto-Accept indicator)
> "Run the test suite, fix all failing tests, and commit the changes"
# Agent executes everything immediately without interruption

Switching Modes

Press Shift+Tab during any chat session to cycle through modes:

Standard Mode → Plan Mode → Auto-Accept Mode → Standard Mode (loops)

The current mode is shown below the input field when not in Standard mode.

Configuration Management

Initialize Configuration Only

Terminal
infer config init

Agent Configuration

Terminal
# Set default model for chat
infer config agent set-model openai/gpt-4

# Set system prompt
infer config agent set-system "You are a helpful coding assistant"

Tool Management

Terminal
# Enable/disable tool execution
infer config tools enable
infer config tools disable

# Manage command whitelist
infer config tools list
infer config tools validate
infer config tools exec

# Safety settings
infer config tools safety enable    # Require approval prompts
infer config tools safety disable
infer config tools safety status

# Sandbox management
infer config tools sandbox add /protected/path
infer config tools sandbox remove /protected/path
infer config tools sandbox list

Configuration

The CLI uses a 2-layer configuration system with precedence:

  1. Environment Variables (INFER_* prefix) - Highest priority
  2. Command Line Flags
  3. Project Config (.infer/config.yaml)
  4. User Config (~/.infer/config.yaml)
  5. Built-in Defaults - Lowest priority

Configuration File Structure

The CLI creates a comprehensive YAML configuration:

YAML
gateway:
  url: http://localhost:8080
  api_key: ''
  timeout: 200
  oci: ghcr.io/inference-gateway/inference-gateway:latest # OCI image for auto-running gateway
  run: true # Auto-run gateway if not available
  docker: true # Use Docker to run gateway
  include_models: [] # Only show these models (empty = all)
  exclude_models: # Models to hide from selection
    - ollama_cloud/cogito-2.1:671b
    - ollama_cloud/kimi-k2:1t
    - ollama_cloud/kimi-k2-thinking
    - ollama_cloud/deepseek-v3.1:671b
client:
  timeout: 200
  retry:
    enabled: true
    max_attempts: 3
    initial_backoff_sec: 5
    max_backoff_sec: 60
    backoff_multiplier: 2
    retryable_status_codes: [400, 408, 429, 500, 502, 503, 504]
logging:
  debug: false
  dir: '' # Directory for log files (optional)
tools:
  enabled: true # Tools are enabled by default with safe read-only commands
  sandbox:
    directories: ['.', '/tmp'] # Allowed directories for tool operations
    protected_paths: # Paths excluded from tool access for security
      - .infer/
      - .git/
      - '*.env'
  bash:
    enabled: true
    whitelist:
      commands: # Exact command matches
        - ls
        - pwd
        - tree
        - wc
        - sort
        - uniq
        - head
        - tail
        - task
        - make
        - find
      patterns: # Regex patterns for more complex commands
        - ^git status$
        - ^git branch( --show-current)?( -[alrvd])?$
        - ^git log
        - ^git diff
        - ^git remote( -v)?$
        - ^git show
  read:
    enabled: true
    require_approval: false
  write:
    enabled: true
    require_approval: true # Write operations require approval by default for security
  edit:
    enabled: true
    require_approval: true # Edit operations require approval by default for security
  delete:
    enabled: true
    require_approval: true # Delete operations require approval by default for security
  grep:
    enabled: true
    backend: auto # "auto", "ripgrep", or "go"
    require_approval: false
  tree:
    enabled: true
    require_approval: false
  web_fetch:
    enabled: true
    whitelisted_domains:
      - golang.org
    safety:
      max_size: 8192 # 8KB
      timeout: 30 # 30 seconds
      allow_redirect: true
    cache:
      enabled: true
      ttl: 3600 # 1 hour
      max_size: 52428800 # 50MB
  web_search:
    enabled: true
    default_engine: duckduckgo
    max_results: 10
    engines:
      - duckduckgo
      - google
    timeout: 10
  todo_write:
    enabled: true
    require_approval: false
  github:
    enabled: true
    token: '%GITHUB_TOKEN%'
    base_url: 'https://api.github.com'
    owner: ''
    repo: '' # Default repository (optional)
    safety:
      max_size: 1048576 # 1MB
      timeout: 30 # 30 seconds
  safety:
    require_approval: true
compact:
  output_dir: .infer # Directory for compact command exports
  summary_model: '' # Model to use for summarization (optional)
agent:
  model: '' # Default model for agent operations
  system_prompt: | # System prompt for the main agent
    Autonomous software engineering agent...
  system_prompt_plan: | # System prompt for plan mode
    You are an AI planning assistant in PLAN MODE...
  system_reminders:
    enabled: true
    interval: 4
    reminder_text: |
      System reminder text for maintaining context
  verbose_tools: false
  max_turns: 50 # Maximum number of turns for agent sessions
  max_tokens: 4096 # The maximum number of tokens per request
  max_concurrent_tools: 5 # Maximum parallel tool executions
  optimization:
    enabled: false
    model: '' # Model for optimization (optional)
    min_messages: 10
    buffer_size: 2
git:
  commit_message:
    model: '' # Model for AI commit messages (optional)
    system_prompt: |
      Generate a concise git commit message...
storage:
  enabled: true
  type: sqlite # Options: memory, sqlite, postgres, redis
  sqlite:
    path: .infer/conversations.db
  postgres:
    host: localhost
    port: 5432
    database: infer_conversations
    username: ''
    password: ''
    ssl_mode: prefer
  redis:
    host: localhost
    port: 6379
    password: ''
    db: 0
conversation:
  title_generation:
    enabled: true
    model: '' # Model for title generation (optional)
    system_prompt: |
      Generate a concise conversation title...
    batch_size: 10
chat:
  theme: tokyo-night
a2a:
  enabled: true
  agents: [] # List of A2A agent URLs
  cache:
    enabled: true
    ttl: 300 # 5 minutes
  task:
    status_poll_seconds: 5
    polling_strategy: exponential
    initial_poll_interval_sec: 2
    max_poll_interval_sec: 60
    backoff_multiplier: 2.0
    background_monitoring: true
    completed_task_retention: 5
  tools:
    query_agent:
      enabled: true
      require_approval: false
    query_task:
      enabled: true
      require_approval: false
    submit_task:
      enabled: true
      require_approval: false
    download_artifacts:
      enabled: true
      download_dir: /tmp/downloads
      timeout_seconds: 30
      require_approval: false

Environment Variables

Terminal
export INFER_GATEWAY_URL="http://localhost:3000"
export INFER_GATEWAY_API_KEY="your-api-key"
export INFER_AGENT_MODEL="openai/gpt-4"
export INFER_LOGGING_DEBUG="true"

Advanced Features

Tool System for LLMs

When enabled, LLMs have access to a comprehensive tool suite:

File System Tools

  • Bash: Execute whitelisted shell commands
  • Read/Write/Edit: File operations with safety controls
  • MultiEdit: Batch file edits
  • Delete/Tree: File management and exploration

Search Tools

  • Grep: Powered by ripgrep for fast code search
  • WebSearch/WebFetch: Internet research capabilities

Development Tools

  • GitHub API: Repository integration
  • TodoWrite: Task management for complex workflows

Security Features

  • Command Whitelisting: Only approved patterns allowed
  • Approval Prompts: Safety confirmations for dangerous operations
  • Path Protection: Sensitive directories automatically excluded
  • Sandbox Controls: Protected directory management

Conversation Management

Storage Backends

  • SQLite (default): Local file-based storage
  • PostgreSQL: Shared database for teams
  • Redis: High-performance caching

Features

  • Automatic conversation history with search
  • Intelligent title generation
  • Token optimization and compaction
  • Export/import capabilities

Interactive Interface

The TUI provides:

  • Scrollable conversation view
  • Keyboard shortcuts for navigation
  • Tool result expansion/collapse
  • Real-time streaming responses
  • Model switching during conversation

Git Shortcuts

  • /git <command> [args...] - Execute git commands (supports commit, push, status, etc.)
  • /git commit [flags] - NEW: Commit staged changes with AI-generated message
  • /git push [remote] [branch] [flags] - NEW: Push commits to remote repository

The git shortcuts provide intelligent commit message generation using AI when no message is provided with /git commit.

User-Defined Shortcuts

You can create custom shortcuts by adding YAML configuration files in the .infer/shortcuts/ directory.

Configuration File Format

Create files named custom-*.yaml (e.g., custom-1.yaml, custom-dev.yaml) in .infer/shortcuts/:

YAML
shortcuts:
  - name: 'tests'
    description: 'Run all tests in the project'
    command: 'go'
    args: ['test', './...']
    working_dir: '.' # Optional: set working directory

  - name: 'build'
    description: 'Build the project'
    command: 'go'
    args: ['build', '-o', 'infer', '.']

  - name: 'lint'
    description: 'Run linter on the codebase'
    command: 'golangci-lint'
    args: ['run']

Real-World Workflows

Workflow 1: Bug Investigation and Fix

Terminal
# Start in Plan Mode to understand the issue first
infer chat
# Shift+Tab to switch to Plan Mode
> "Analyze the bug reported in issue #123 and create a fix plan"

# Agent reads code, identifies root cause, provides detailed plan

# Switch to Standard Mode to implement
# Shift+Tab to return to Standard Mode
> "Implement the fix according to the plan"
# Agent makes changes, you approve each modification

# Test and commit
> "Run the test suite to verify the fix"
> "/git commit"  # AI generates commit message

Workflow 2: Feature Development from Scratch

Terminal
# Initialize project understanding
infer chat
> "Read the CONTRIBUTING.md and understand the project structure"
> "Find similar features to understand the patterns"

# Create implementation plan
# Shift+Tab to Plan Mode
> "Design the implementation for user profile feature with avatar upload"

# Switch to Auto-Accept for rapid development
# Shift+Tab twice to Auto-Accept Mode
> "Implement the user profile feature according to the plan"
# Agent creates files, writes code, no interruptions

# Review and test
# Shift+Tab back to Standard Mode
> "Review the changes and run all tests"

Workflow 3: Code Review and Refactoring

Terminal
infer chat
# Use Plan Mode for analysis
> "Review the authentication module for security issues and code quality"
# Agent provides detailed analysis

# Implement suggested improvements
> "Refactor based on the recommendations, prioritize security issues"
# Agent makes changes with approval

Workflow 4: Working with GitHub Issues

Terminal
# Let the agent read the issue
infer agent "Fix the bug described in GitHub issue #456"

# Agent will:
# 1. Fetch issue details using GitHub tool
# 2. Analyze relevant code
# 3. Implement fix
# 4. Run tests
# 5. Create commit with reference to issue

Workflow 5: Documentation Generation

Terminal
infer chat
> "Generate comprehensive API documentation for all exported functions in the /api directory"
# Agent reads code and creates markdown documentation

> "Create a README.md with installation instructions and examples"
# Agent analyzes project structure and creates README

Workflow 6: Automated Testing

Terminal
infer agent "Create unit tests for all functions in the user service with >80% coverage"

# Agent autonomously:
# - Analyzes the user service code
# - Identifies untested functions
# - Writes comprehensive test cases
# - Runs tests to verify coverage

Tips and Best Practices

For Beginners

  1. Start with Plan Mode: When working with unfamiliar code, use Plan Mode first to understand before making changes
  2. Use Git: Always work in a git repository so you can easily revert changes
  3. Approve Carefully: Read the diff visualization before approving file modifications
  4. Start Small: Begin with simple tasks like "read this file" or "explain this function"

For Power Users

  1. Auto-Accept for Trusted Tasks: Use Auto-Accept mode for repetitive, well-understood tasks
  2. Custom Shortcuts: Create shortcuts for frequent commands (tests, builds, deployments)
  3. Combine with Scripts: Let the agent generate scripts, then use custom shortcuts to run them
  4. A2A Integration: Delegate specialized tasks to A2A agents (testing, documentation, security scans)

Performance Tips

  1. Be Specific: Instead of "fix the code," say "fix the null pointer error in handleRequest function"
  2. Provide Context: Reference file paths, function names, or line numbers when relevant
  3. Use Grep First: For large codebases, use Grep to narrow down relevant files before asking for analysis
  4. Chunk Large Tasks: Break down complex features into smaller, manageable subtasks

Safety Best Practices

  1. Review Diffs: Always review file modification diffs before approving in Standard Mode
  2. Test Before Commit: Run tests after significant changes
  3. Backup Important Work: Have backups before using Auto-Accept mode extensively
  4. Whitelist Commands: Only whitelist commands you understand and trust
  5. Protected Paths: Add sensitive directories to protected paths in configuration

Security & Safety

Command Whitelisting

Terminal
# Add allowed commands
infer config tools whitelist add "npm install"
infer config tools whitelist add "git log --oneline"

# Remove from whitelist
infer config tools whitelist remove "dangerous-command"

Protected Paths

Sensitive directories are automatically protected:

  • .git/ - Git repository data
  • *.env - Environment files
  • node_modules/ - Dependencies
  • Custom paths via sandbox configuration

Approval Prompts

Enable safety confirmations:

Terminal
infer config tools safety enable

LLMs will request approval before executing potentially dangerous operations.

Integration Examples

Development Workflow

Terminal
# Initialize new project
infer init

# Interactive development
infer chat
> "Read the authentication module and explain how it works"
> "Refactor the database connection to use connection pooling"

# Autonomous agent for complex tasks
infer agent "Fix all linting errors in the codebase"
infer agent "Implement user authentication with JWT"
infer agent "Review the changes in this PR and suggest improvements"

CI/CD Integration

To be implemented

Troubleshooting

Connection Issues

Terminal
# Check configuration
infer config show

# Verify gateway status
infer status

# Debug mode
infer --debug chat

Permission Issues

Terminal
# Check configuration directory
ls -la ~/.infer/

# Reset configuration
infer config reset

# Re-initialize
infer init

Tool Execution Problems

Terminal
# Check tool status
infer config tools status

# Validate whitelist
infer config tools validate

# Enable debug logging
export INFER_LOGGING_DEBUG=true
infer agent "your task"

Command Reference

CommandDescription
infer initInitialize project configuration
infer statusCheck gateway health
infer chatInteractive chat session
infer agent <task>Autonomous task execution
infer config <subcommand>Configuration management
infer --versionShow version information
infer --helpDisplay help information

Support and Resources

The CLI is actively developed with regular updates and new features. Check the repository for the latest releases and announcements.