Inference Gateway CLI
The Inference Gateway CLI (infer) is a powerful Go-based command-line tool providing comprehensive access to the Inference Gateway with interactive chat, autonomous agents, Computer Use tools, and development workflows.
Current Version: v0.109.0 (Breaking changes expected until stable)
Key Features
- 🚀 Zero-Configuration Setup - Add API keys and start chatting
- 🤖 Autonomous Agent Mode - Delegate complex tasks with iterative execution
- 🖥️ Computer Use Tools - GUI automation with screenshot, mouse, and keyboard control
- 🛠️ Rich Tool Integration - File operations, code search, web access, GitHub integration
- 🔒 Smart Safety System - Configurable approval workflow with diff visualization
- 🎨 Beautiful TUI - Scrollable interface with syntax highlighting and multiple themes
- 🌐 Web Terminal - Browser-based interface with tabbed sessions
- 💬 Remote Messaging Channels - Control the agent from Telegram and other platforms (Learn more)
- 🧠 Agent Skills - Reusable, model-readable instruction folders loaded on demand, portable across vendors (Learn more)
- 💰 Cost Tracking - Real-time token usage and cost calculation
Installation
Install Script (Recommended)
# Latest version
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash
# Specific version
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash -s -- --version v0.97.0
# Custom directory
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash -s -- --install-dir $HOME/.local/bin
Go Install
go install github.com/inference-gateway/cli@latest
Manual Download
Download binaries from the GitHub releases page. Binaries are signed with Cosign for verification.
Build from Source
git clone https://github.com/inference-gateway/cli.git
cd cli
go build -o infer
Quick Start

# Initialize configuration
infer init
# Generate AGENTS.md documentation for AI agents (recommended for new projects)
infer chat
> /init
# Check gateway status
infer status
# Start interactive chat
infer chat
# Launch web terminal
infer chat --web
# Autonomous agent mode
infer agent "Analyze this codebase and suggest improvements"
# Get help
infer --help
Generating AGENTS.md
For new projects, use the /init shortcut to automatically generate an AGENTS.md file. This file provides structured documentation that helps AI agents understand your project:
infer chat
> /init
The agent will:
- Analyze your project structure with the Tree tool
- Examine configuration files, build systems, and documentation
- Generate comprehensive
AGENTS.mdincluding:- Project overview and technologies
- Architecture and structure
- Development environment setup
- Key commands (build, test, lint, run)
- Testing instructions
- Project conventions and coding standards
- Important files and configurations
This documentation helps other AI agents (and developers) quickly understand how to work with your project.
Core Commands
| Command | Description | Key Features |
|---|---|---|
infer init | Initialize project configuration | Creates .infer/config.yaml with defaults |
infer status | Check gateway health | Shows resource usage and connectivity |
infer chat | Interactive chat TUI | Streaming, scrolling, tool expansion, mode switching |
infer chat --web | Web-based terminal | Browser interface, tabbed sessions, remote access |
infer agent <task> | Autonomous task execution | Background operation, task planning, validation |
infer config <cmd> | Configuration management | Model, tools, safety, sandbox settings |
Chat Interface Features
Navigation:
- Shift + Arrow Down/Up: Scroll chat history
- Ctrl+R: Toggle tool result expansion
- Shift+Tab: Cycle agent modes (Standard → Plan → Auto-Accept)
- Ctrl+K: Toggle model thinking blocks
Capabilities:
- Real-time streaming with syntax highlighting
- Mouse wheel and keyboard scrolling
- Model switching during conversation
- Tool result inspection
- Cost tracking in status bar
- Collapsible thinking blocks
Agent Modes
Toggle between modes anytime during chat using Shift+Tab.
| Mode | Tools | Approval | Best For |
|---|---|---|---|
| Standard (Default) | All configured | Required for Write/Edit/Delete/Bash | General development, collaborative coding |
| Plan (Read-Only) | Read, Grep, Tree only | None | Code reviews, architecture analysis, planning |
| Auto-Accept (YOLO) | All configured | None - immediate execution | Trusted environments, rapid prototyping, automation |
Standard Mode
Full tool access with safety controls and approval prompts for sensitive operations.
infer chat
> "Refactor the authentication module to use environment variables"
# Agent analyzes code, proposes changes, requests approval before modifying
Plan Mode
Analysis and planning without execution. Safe exploration of unfamiliar codebases.
infer chat
# Press Shift+Tab to switch to Plan Mode
> "How should I implement user authentication with JWT tokens?"
# Agent explores code structure and provides detailed plan
Auto-Accept Mode
Zero approval prompts for maximum speed. Use with caution in version-controlled environments.
infer chat
# Press Shift+Tab twice to switch to Auto-Accept Mode
> "Run the test suite, fix all failing tests, and commit the changes"
# Agent executes everything immediately
⚠️ Important for Auto-Accept: Ensure clean git working tree and backups.
Computer Use
GUI automation and visual understanding capabilities for interacting with applications and desktop environments.
Display Server Support
Automatic display server detection - no configuration needed:
| Platform | Supported Servers | Notes |
|---|---|---|
| macOS | Quartz (native), X11 (XQuartz) | Quartz automatically detected and used |
| Linux | X11, Wayland | Auto-detection handles both protocols |
Display server type is automatically detected at runtime. No manual configuration required.
Computer Use Tools
| Tool | Description | Key Capabilities |
|---|---|---|
| GetLatestScreenshot | Capture screen regions | Streaming mode, region selection, circular buffer, JPEG format (configurable quality) |
| MouseMove | Control cursor position | Absolute coordinates, relative movement |
| MouseClick | Perform click actions | Left/right/middle clicks, double-click support |
| MouseScroll | Scroll content | Vertical and horizontal scrolling |
| KeyboardType | Type text and keys | Plain text, key combinations (Ctrl+C, Cmd+V), configurable typing delay |
| GetFocusedApp | Identify active app | Returns focused application name |
| ActivateApp | Switch applications | Focus and activate specific apps |
Screenshot Tool Features
Streaming Mode:
- Maintains circular buffer of recent screenshots
- Configurable buffer size (default: 5)
- Configurable capture interval (default: 3 seconds)
- Efficient memory management
- Fast access to recent captures
Image Optimization:
- Automatic resolution scaling (max: 1920x1080, target: 1024x768)
- JPEG compression with configurable quality (default: 85%)
- Reduces bandwidth and storage requirements
- Optional capture overlay for debugging
Region Selection:
- Full screen capture
- Custom region coordinates (x, y, width, height)
- Multiple monitor support
Floating Window
Real-time visualization of agent activity:
computer_use:
floating_window:
enabled: true
respawn_on_close: true # Auto-restart if closed
position: top-right # top-left, top-right, bottom-left, bottom-right
always_on_top: true # Keep window above other apps
Features:
- Always-on-top overlay
- Shows agent actions in real-time
- Configurable position
- Auto-respawn option if accidentally closed
- Non-intrusive design
- Available on all platforms with GUI support
Computer Use Configuration
computer_use:
enabled: true
floating_window:
enabled: true
respawn_on_close: true
position: top-right
always_on_top: true
screenshot:
enabled: true
max_width: 1920 # Maximum capture width
max_height: 1080 # Maximum capture height
target_width: 1024 # Target resize width
target_height: 768 # Target resize height
format: jpeg # jpeg or png
quality: 85 # JPEG quality (1-100)
streaming_enabled: true
capture_interval: 3 # Seconds between captures
buffer_size: 5 # Number of screenshots to buffer
temp_dir: '' # Temporary storage directory
log_captures: false # Log each capture
show_overlay: true # Show capture overlay
rate_limit:
enabled: true
max_actions_per_minute: 60
window_seconds: 60
tools:
mouse_move:
enabled: true
mouse_click:
enabled: true
mouse_scroll:
enabled: true
keyboard_type:
enabled: true
max_text_length: 1000
typing_delay_ms: 100
get_focused_app:
enabled: true
activate_app:
enabled: true
Safety and Rate Limiting
Rate Limiting:
- Default: 60 actions per minute
- Prevents runaway automation
- Configurable threshold
Safety Controls:
- Approval prompts in Standard Mode
- Auto-approve in YOLO mode
- Activity logging for audit trails
- Command execution monitoring
Best Practices:
- Use Standard Mode for initial exploration
- Enable logging for debugging
- Set appropriate rate limits
- Monitor activity logs
- Test in safe environments first
Example Use Cases
infer chat
> "Take a screenshot and analyze the error dialog"
> "Click the Submit button in the center of the screen"
> "Type 'Hello World' and press Enter"
> "Switch to the Terminal app and run ls command"
> "Find the Save button and click it"
Tools & Capabilities
When tools are enabled, LLMs have access to a comprehensive suite across multiple categories.
Tool Categories
| Category | Tools | Description |
|---|---|---|
| File System | Read, Write, Edit, MultiEdit, Delete, Tree, Grep | File operations and search with safety controls |
| Command Execution | Bash, BashOutput, KillShell, ListShells | Whitelisted shell execution and background shell control |
| Web | WebSearch, WebFetch, Github | Internet research, content fetching, GitHub API |
| Workflow | TodoWrite, Schedule, RequestPlanApproval | Task tracking, cron jobs, plan-mode approval |
| A2A Integration | A2A_QueryAgent, A2A_SubmitTask, A2A_QueryTask | Delegate to specialized agents — see A2A |
| Computer Use | GetLatestScreenshot, MouseMove, MouseClick, MouseScroll, KeyboardType, GetFocusedApp, ActivateApp | GUI automation — see the Computer Use section above |
| MCP | MCP_<server>_<tool> | Dynamically registered tools from MCP servers — see MCP |
File System Tools
Read
Read a file from the local filesystem with an optional line range. Handles text files and PDFs.
- Parameters:
file_path(required, absolute or relative),limit(default 2000 lines),offset(default 1) - Approval: not required (read-only)
- Notes: lines longer than 2000 characters are truncated; output is returned in
cat -nformat
Write
Write content to a file on disk. Overwrites the existing file at the given path.
- Parameters:
file_path(required, absolute),content(required) - Approval: required by default
- Notes: if the file exists, the Read tool must have been used first; respects configured path exclusions (
.git/,*.env,.infer/)
Edit
Perform an exact string replacement in a single file.
- Parameters:
file_path(required),old_string(required — must match exactly and be unique unlessreplace_allis set),new_string(required — must differ fromold_string),replace_all(defaultfalse) - Approval: required by default
- Notes: the file must have been Read at least once in the conversation; indentation must be preserved exactly
MultiEdit
Apply a sequence of edits to a single file atomically — either all succeed or none are applied.
- Parameters:
file_path(required),edits(required array; each item hasold_string,new_string, optionalreplace_all) - Approval: required by default
- Notes: edits are applied in order, each operating on the result of the previous one — plan them so earlier edits don't invalidate later matches
Delete
Delete a file or directory. Wildcards are supported when enabled.
- Parameters:
path(required — supports patterns like*.txtortemp/*),recursive(defaultfalse),force(defaultfalse),format(textorjson) - Approval: required by default
- Notes: restricted to the current working directory for safety
Tree
Display a directory tree, similar to the Unix tree command.
- Parameters:
path(default.),max_depth(1–10, default 3),max_files(1–1000, default 100),respect_gitignore(defaulttrue),show_hidden(defaultfalse),format(textorjson) - Approval: not required
- Notes: uses the system
treebinary when available, otherwise falls back to a built-in implementation
Grep
Powerful regex search across files. Uses ripgrep when available, otherwise a built-in Go implementation.
- Parameters:
pattern(required regex),path(default cwd),glob(e.g.*.ts,**/*.tsx),type(e.g.go,py,rust),output_mode(content|files_with_matches|count, defaultfiles_with_matches),-i,-n,-A,-B,-C,multiline,head_limit - Approval: not required
- Backend: configurable via
tools.grep.backend(auto|ripgrep|go) - Notes: respects
.gitignore; auto-excludes.git,node_modules,.infer,vendor,dist,build,target
Command Execution
Bash
Execute a whitelisted bash command. Only commands matching the configured whitelist (exact commands or regex patterns) can run.
- Parameters:
command(required — must match the whitelist),format(textorjson) - Approval: configurable via
tools.bash.require_approval
tools:
bash:
enabled: true
whitelist:
commands:
- ls
- pwd
- git status
patterns:
- ^git branch.*
- ^npm (install|test|run).*
require_approval: false
BashOutput, KillShell, ListShells
Background-shell management. These tools are only registered when tools.bash.background_shells.enabled: true.
- BashOutput —
bash_id(required),filter(optional regex). Returns only new output since the last read. - KillShell —
shell_id(required). Sends SIGTERM, then SIGKILL after 5 seconds if the shell doesn't exit. - ListShells — no parameters. Lists all running and recently completed background shells with their IDs, state, and elapsed time.
Web Tools
WebSearch
Search the web via DuckDuckGo or Google.
- Parameters:
query(required),engine(duckduckgo|google, defaults to the configured engine),limit(1–50, defaults to configuredmax_results),format(textorjson)
tools:
web_search:
enabled: true
default_engine: duckduckgo
max_results: 10
engines: [duckduckgo, google]
timeout: 10
WebFetch
Fetch content from a whitelisted URL. Optionally save the response to disk.
- Parameters:
url(required),format(textorjson),download(defaultfalse— whentrue, saves under~/.infer/tmp) - Notes: only whitelisted domains are allowed; responses are cached (default 15-minute TTL)
tools:
web_fetch:
enabled: true
whitelisted_domains:
- golang.org
- github.com
safety:
max_size: 8192
timeout: 30
cache:
enabled: true
ttl: 3600
Github
Read and write GitHub issues, pull requests, and comments via the GitHub API.
- Parameters:
owner,repo(both required unless preconfigured intools.github.{owner,repo}),resource(defaultissue; one ofissue,issues,pull_request,comments,create_comment,update_comment,create_pull_request,update_pull_request),issue_number,comment_id,comment_body,title,body,head,base(defaultmain),state(open|closed|all, defaultopen),per_page(max 100, default 30)
tools:
github:
enabled: true
token: '%GITHUB_TOKEN%' # env-var interpolation
base_url: https://api.github.com
owner: your-org
repo: your-repo
safety:
timeout: 30
require_approval: false
Workflow Tools
TodoWrite
Create and update a structured task list for the current session. Use for complex multi-step work to track progress and surface intent to the user.
- Parameters:
todos(required array; each item hascontent,status∈pending|in_progress|completed, and optionalid) - Approval: not required
- Best practice: keep at most one task in
in_progressat a time; mark itemscompletedimmediately on finishing
Schedule
Create, list, get, update, or delete cron jobs that fire through the same messaging channel that started the session (e.g. Telegram). Jobs are persisted as YAML under ~/.infer/schedules/ and executed by the infer channels-manager daemon (which hot-reloads via fsnotify).
- Parameters:
operation(required:create|list|get|update|delete),job_id(required for get/update/delete),cron_expression(5-field crontab or@every <duration>),prompt,run_once(defaultfalse— whentrue, the job is deleted after firing once),name,description,model(optional model override) - Approval: required by default
- Notes: each fire creates a brand-new agent session — no context is carried between runs; only usable from a channel-driven session
"0 8 * * *" every day at 08:00
"*/15 * * * *" every 15 minutes
"0 9 * * 1-5" weekdays at 09:00
"@every 1h" every hour
RequestPlanApproval
Submit a completed plan for user approval. Available only in Plan Mode.
- Parameters:
plan(required — the complete, detailed plan text) - Behavior: pauses execution until the user approves (switches to execution mode) or rejects (provides feedback)
Security Features
- Command Whitelisting: Only approved patterns allowed for Bash tool
- Approval Prompts: Safety confirmations for Write/Edit/Delete/Bash
- Path Protection: Sensitive directories automatically excluded (
.git/,*.env,.infer/) - Sandbox Controls: Restrict tool operations to allowed directories
- Domain Whitelisting: Control web fetch access
- Diff Preview: Visual diff before file modifications
Tool Configuration
# Enable/disable tools
infer config tools enable
infer config tools disable
# Safety settings
infer config tools safety enable
infer config tools safety disable
infer config tools safety status
# Sandbox management
infer config tools sandbox add /protected/path
infer config tools sandbox remove /protected/path
infer config tools sandbox list
Configuration
Two-layer configuration system with precedence from highest to lowest:
Configuration Precedence
| Priority | Source | Example |
|---|---|---|
| 1 (Highest) | Environment Variables | INFER_GATEWAY_URL, INFER_AGENT_MODEL |
| 2 | Command Line Flags | --model, --debug |
| 3 | Project Config | .infer/config.yaml |
| 4 | User Config | ~/.infer/config.yaml |
| 5 (Lowest) | Built-in Defaults | Internal defaults |
Key Configuration Areas
Gateway Settings:
- Gateway URL and API key
- Timeout and retry configuration
- OCI image for auto-running gateway
- Model filtering (include/exclude lists)
Agent Configuration:
- Default model for operations
- System prompts (main and plan mode)
- System reminders interval
- Max turns and tokens
- Parallel tool execution (default: 5 concurrent)
Tool Settings:
- Enable/disable individual tools
- Approval requirements per tool
- Command whitelists and patterns
- Sandbox directories
- Protected paths
Storage Backends:
- SQLite (default) - local file storage
- PostgreSQL - shared database for teams
- Redis - high-performance caching
- In-memory - temporary sessions
Conversation Features:
- Automatic history with search
- AI-generated titles
- Token optimization and compaction
- Export/import capabilities
Essential Environment Variables
export INFER_GATEWAY_URL="http://localhost:8080"
export INFER_GATEWAY_API_KEY="your-api-key"
export INFER_AGENT_MODEL="openai/gpt-4"
export INFER_LOGGING_DEBUG="true"
export GITHUB_TOKEN="your-github-token"
Configuration Commands
# Initialize configuration
infer config init
# Agent settings
infer config agent set-model openai/gpt-4
infer config agent set-system "You are a helpful coding assistant"
# View current configuration
infer config show
# Reset to defaults
infer config reset
See the full configuration reference for detailed options.
Shortcuts
The CLI provides built-in shortcuts and supports custom user-defined shortcuts.
Built-in Shortcuts
| Shortcut | Description | Example |
|---|---|---|
/init | Generate AGENTS.md documentation | /init |
/init-github-action | Setup GitHub Action integration | /init-github-action |
/git <cmd> | Git operations | /git status, /git commit, /git push |
/scm <cmd> | GitHub operations | /scm pr-create, /scm issue view 123 |
/a2a | View connected A2A agents | /a2a |
/skills <cmd> | Manage Agent Skills | /skills list, /skills install <url> |
Git Shortcuts
# Execute git commands
/git status
/git branch
# AI-generated commit message
/git commit
# Push to remote
/git push origin main
SCM (GitHub) Shortcuts
# List GitHub issues
/scm issues
# View issue details
/scm issue 123
# Create pull request with AI-powered plan
/scm pr-create
GitHub Action Setup
The /init-github-action shortcut launches an interactive wizard for setting up AI-powered issue automation using GitHub Apps and the infer-action GitHub Action. This wizard streamlines the process of creating GitHub Apps, managing credentials, configuring repository secrets, and generating workflows that respond to issue mentions with @infer.
Key Features:
- Interactive wizard for creating or configuring GitHub Apps
- Supports both personal and organization repositories
- Automatic workflow file generation in
.github/workflows/ - Private key management with interactive file picker
- GitHub App reusability across multiple repositories
- Auto-opens browser with pre-filled app creation forms
- Multi-step guided setup process
Prerequisites:
- GitHub account with repository access
- Admin permissions for creating GitHub Apps (required for organization repositories)
- Downloaded private key file (
.pem) from GitHub (after app creation)
Usage:
infer chat
> /init-github-action
Wizard Flow:
- Check Existing Configuration: Detects if a GitHub App is already configured
- App ID Input: Enter existing App ID or create a new GitHub App
- Private Key Selection: Interactive file picker to select your
.pemprivate key file - Repository Configuration: Configure repository secrets and permissions
- Workflow Creation: Automatically generates GitHub Action workflow files
Creating a New GitHub App:
When creating a new app, the wizard opens GitHub with pre-configured settings:
- App Name:
infer-bot(customizable) - Required Permissions:
- Contents: Write access
- Pull Requests: Write access
- Issues: Write access
- Metadata: Read access
- Webhooks: Disabled by default (can be enabled later if needed)
Steps for First-Time Setup:
- Run
/init-github-actionin chat mode - Choose to create a new GitHub App
- Browser opens with pre-filled GitHub App creation form
- Complete the app creation on GitHub
- Download the private key (
.pemfile) from GitHub - Return to CLI and enter the App ID shown on GitHub
- Use the file picker to select your downloaded
.pemfile - Wizard creates workflow files in
.github/workflows/
Reusing GitHub Apps:
The same GitHub App can be reused across multiple repositories:
cd another-project
infer chat
> /init-github-action
# Enter the same App ID and use the same private key file
Generated Workflow Files:
The wizard creates GitHub Action workflows in .github/workflows/infer.yml that:
- Trigger on issue events (opened, edited) and issue comments
- Generate GitHub App tokens for authentication
- Execute AI-powered agents via the
@infermention trigger - Support multiple LLM providers (OpenAI, Anthropic, DeepSeek, etc.)
- Provide full repository access (issues, contents, pull requests)
Example Generated Workflow:
name: Infer
on:
issues:
types: [opened, edited]
issue_comment:
types: [created]
permissions:
issues: write
contents: write
pull-requests: write
jobs:
infer:
runs-on: ubuntu-24.04
steps:
- name: Generate GitHub App Token
id: generate-token
uses: actions/[email protected]
with:
app-id: ${{ secrets.INFER_APP_ID }}
private-key: ${{ secrets.INFER_APP_PRIVATE_KEY }}
owner: ${{ github.repository_owner }}
- name: Checkout Repository
uses: actions/[email protected]
with:
token: ${{ steps.generate-token.outputs.token }}
- name: Run Infer Agent
uses: inference-gateway/[email protected]
with:
github-token: ${{ steps.generate-token.outputs.token }}
trigger-phrase: '@infer'
model: 'deepseek/deepseek-v4-pro'
max-turns: 50
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
openai-api-key: ${{ secrets.OPENAI_API_KEY }}
google-api-key: ${{ secrets.GOOGLE_API_KEY }}
deepseek-api-key: ${{ secrets.DEEPSEEK_API_KEY }}
Repository Secrets Configuration:
After running the wizard, configure these secrets in your GitHub repository settings:
INFER_APP_ID- Your GitHub App IDINFER_APP_PRIVATE_KEY- Your GitHub App private key (.pem file contents)- Provider API keys (
ANTHROPIC_API_KEY,OPENAI_API_KEY, etc.)
Usage in Issues:
Once configured, mention @infer in any issue or issue comment to activate the agent:
@infer Please analyze this bug and suggest a fix
For more information on the infer-action GitHub Action, see the GitHub Action documentation.
Custom Shortcuts
Create YAML files in .infer/shortcuts/ directory. Shortcuts support three types:
1. Simple Commands
Execute a single command:
# .infer/shortcuts/simple.yaml
shortcuts:
- name: hello
description: 'Say hello'
command: echo
args:
- 'Hello from Inference Gateway!'
2. Shortcuts with Subcommands
Group related commands under a parent shortcut:
# .infer/shortcuts/dev.yaml
shortcuts:
- name: dev
description: 'Development operations'
command: bash
subcommands:
- name: test
description: 'Run all tests'
args:
- -c
- 'go test ./...'
- name: build
description: 'Build the project'
args:
- -c
- 'go build -o app .'
Usage: /dev test, /dev build
3. AI-Powered Snippets
Use LLM to generate dynamic content based on command output. The snippet.prompt can reference JSON fields from command output using {fieldName} placeholders, and snippet.template uses {llm} for the AI-generated response:
# .infer/shortcuts/ai-commit.yaml
shortcuts:
- name: ai-commit
description: 'AI-generated commit message'
command: bash
args:
- -c
- |
diff=$(git diff --cached)
jq -n --arg diff "$diff" '{"diff": $diff}'
snippet:
prompt: "Generate commit message for:\n{diff}"
template: '!git commit -m "{llm}"'
The command must output JSON. Fields are accessible in the prompt template via {fieldName} syntax. The LLM response is accessible via {llm} in the template.
Advanced Features
Cost Tracking
Real-time token usage and cost calculation displayed in the status bar.
Features:
- Per-model pricing calculation
- Cumulative session costs
- Input and output token tracking
- Status bar indicator (💰 $0.0234)
- Custom pricing support
View Costs:
# Costs displayed in status bar during chat
infer chat
# Status bar shows: 💰 $0.0234 | Model: openai/gpt-4o
# Export conversation with cost details
infer conversation export <conversation-id>
Model Thinking Visualization
Collapsible thinking blocks for models that support thinking (Claude, o1, etc.).
Features:
- Collapsible blocks with first sentence preview
- Ctrl+K keyboard shortcut to toggle
- Theme-aware styling
- Performance optimization (long thinking blocks collapsed by default)
Usage:
infer chat
# Ask complex question requiring reasoning
> "Design a scalable microservices architecture for e-commerce"
# Model's thinking process displayed in collapsible blocks
# Press Ctrl+K to expand/collapse thinking
Conversation Management
Storage Backends:
- SQLite (default):
.infer/conversations.db - PostgreSQL: Shared team database
- Redis: High-performance caching
- In-memory: Temporary sessions
Features:
- Automatic conversation history
- AI-generated titles (batch: 10 messages)
- Search across conversations
- Export to JSON/Markdown
- Token optimization with compaction
Commands:
# List conversations
infer conversation list
# Show conversation
infer conversation show <id>
# Export conversation
infer conversation export <id>
# Delete conversation
infer conversation delete <id>
MCP Integration
Connect to Model Context Protocol servers for extended capabilities. MCP provides stateless tool execution for external services like databases, file systems, and APIs.
Setup:
Initialize project to create .infer/mcp.yaml:
infer init
Configure MCP servers in .infer/mcp.yaml:
enabled: true
connection_timeout: 30
discovery_timeout: 30
liveness_probe_enabled: true
liveness_probe_interval: 10
servers:
# Auto-start MCP server in container (recommended)
- name: 'demo-server'
enabled: true
run: true
oci: 'mcp-demo-server:latest'
description: 'Demo MCP server'
# Connect to external MCP server
- name: 'filesystem'
url: 'http://localhost:3000/sse'
enabled: true
description: 'File system operations'
exclude_tools:
- 'delete_file'
CLI Commands:
# Add auto-start MCP server
infer mcp add my-server --run --oci=my-mcp:latest
# List MCP servers
infer mcp list
# Toggle server
infer mcp toggle my-server
# Remove server
infer mcp remove my-server
Using MCP Tools:
MCP tools appear as MCP_<server>_<tool> in chat. Example:
infer chat
> "Use the MCP_demo-server_get_time tool to get current time"
See MCP documentation for detailed integration guide and server development.
Agent Skills
Reusable, model-readable instruction folders that the agent loads on demand. The CLI uses the same on-disk format as Claude Code, Gemini CLI, and OpenAI Codex CLI, so a skill authored for any of those tools drops in unchanged.
Why use skills:
- Lazy-loaded: only the skill list is in the system prompt; bodies are read on demand via the Read tool
- Zero cost when off: disabled by default, no token overhead until you enable
- Portable: drop folders from
github.com/anthropics/skillsorgithub.com/google/skillsstraight into.infer/skills/
Skill format:
A skill is a directory containing a SKILL.md with YAML frontmatter:
---
name: pdf-helper
description: Extract text from PDFs. Use when the user asks to read, summarise, or analyse a PDF file.
---
# PDF Helper
1. Use the Bash tool to invoke `pdftotext input.pdf -` and capture stdout.
2. If the PDF is image-only, fall back to `tesseract` for OCR.
Optional sibling directories (references/, scripts/, assets/) hold supporting material the model reads or executes once the skill is active.
Frontmatter rules:
name(required): ≤64 chars, lowercase letters/digits/hyphens only, must equal the directory name, must not containinfer,claude,anthropic,gemini, oropenaidescription(required): non-empty, ≤1024 chars — make it actionable; this is the routing signal- Unknown keys (e.g. Anthropic's
allowed-tools:, Gemini'sdisabled:) are tolerated and ignored
Locations:
| Scope | Path | Notes |
|---|---|---|
| Project-local | .infer/skills/<name>/SKILL.md | Overrides user-global of the same name |
| User-global | ~/.infer/skills/<name>/SKILL.md | Personal defaults across projects |
Enabling:
Skills are disabled by default. Enable via config or environment variable:
# .infer/config.yaml
agent:
skills:
enabled: true
disabled_skills: [] # optional list of skill names to skip
INFER_AGENT_SKILLS_ENABLED=true infer chat
Managing skills:
# List discovered skills (works regardless of the enabled flag)
infer skills list
infer skills list --format json
# Install from a public GitHub directory URL
infer skills install https://github.com/anthropics/skills/tree/main/skills/pdf
infer skills install <url> --user # install to ~/.infer/skills instead
infer skills install <url> --overwrite # replace an existing skill folder
# Uninstall by name
infer skills uninstall pdf-helper
infer skills uninstall pdf-helper --user
# Or from inside chat
> /skills list
> /skills install <github-url>
> /skills uninstall <name>
GitHub installer notes:
- URL must point at a directory:
https://github.com/<owner>/<repo>/tree/<ref>/<path>. URLs at/blob/(files) or the repo root are rejected with a clear error. - Public repos only — private repos and
GITHUB_TOKENauth are not supported in this version. - Refs containing
/(e.g.feature/foo) aren't supported; use a tag or single-segment branch. - GitHub's unauthenticated API rate limit is 60 req/hour per IP; each install is one tree call plus one raw download per file.
- Frontmatter is re-validated post-download — a half-installed skill is never left on disk.
Security:
A skill instructs the model to run shell commands, read files, or call external APIs. Treat skills like any other piece of executable content — only install from trusted sources. The CLI's normal tool-approval system still gates each call, but a malicious skill could craft a plausible-looking Bash command. The name validator rejects vendor strings to make impersonating an official skill harder.
A2A Integration
Delegate specialized tasks to Agent-to-Agent compatible agents.
Setup:
# Initialize agents configuration
infer agents init
# Add remote agent
infer agents add calendar-agent http://calendar.example.com
# Add local agent with Docker
infer agents add my-agent http://localhost:8081 --oci ghcr.io/myorg/agent:latest --run
# List agents
infer agents list
# View agent details
infer agents show calendar-agent
Usage:
infer chat
> "Schedule a meeting tomorrow at 2 PM using the calendar agent"
> /a2a # View connected agents
See A2A documentation for creating custom agents, or use the ADL CLI to scaffold new A2A agents from YAML definitions.
Parallel Tool Execution
Execute up to 5 tools concurrently for improved performance.
Configuration:
agent:
max_concurrent_tools: 5 # Default: 5
Benefits:
- Faster multi-file operations
- Concurrent web fetches
- Parallel code searches
- Reduced total execution time
Workflows
Bug Investigation and Fix
infer chat
# Shift+Tab to Plan Mode
> "Analyze bug in issue #123 and create fix plan"
# Shift+Tab to Standard Mode
> "Implement the fix according to the plan"
# Test and commit
> "Run test suite to verify"
> "/git commit"
Feature Development
infer chat
> "Read CONTRIBUTING.md and understand project structure"
# Shift+Tab to Plan Mode
> "Design implementation for user profile feature with avatar upload"
# Shift+Tab twice to Auto-Accept Mode
> "Implement the user profile feature according to the plan"
# Shift+Tab to Standard Mode
> "Review changes and run all tests"
Code Review and Refactoring
infer chat
# Plan Mode for analysis
> "Review authentication module for security issues and code quality"
# Standard Mode for implementation
> "Refactor based on recommendations, prioritize security issues"
GitHub Issue Resolution
infer agent "Fix the bug described in GitHub issue #456"
# Agent autonomously:
# 1. Fetches issue details
# 2. Analyzes relevant code
# 3. Implements fix
# 4. Runs tests
# 5. Creates commit referencing issue
Best Practices
For Beginners
- Start with Plan Mode for unfamiliar code
- Always work in git repositories
- Review diff visualizations before approving
- Begin with simple tasks
For Power Users
- Use Auto-Accept for trusted, repetitive tasks
- Create custom shortcuts for frequent commands
- Combine with scripts for automation
- Leverage A2A for specialized workflows
Performance Tips
- Be specific with file paths and function names
- Use Grep to narrow down relevant files first
- Break large tasks into smaller subtasks
- Provide context with references
Safety
- Review diffs before approving modifications
- Run tests after significant changes
- Have backups before extensive Auto-Accept usage
- Whitelist only trusted commands
- Add sensitive directories to protected paths
Security
Command Whitelisting
Bash tool only executes whitelisted commands and patterns:
tools:
bash:
whitelist:
commands: [ls, pwd, tree, git]
patterns:
- ^git status$
- ^git branch.*$
- ^npm test$
Protected Paths
Automatically excluded from tool access:
.git/- Repository data*.env- Environment files.infer/- Configuration directory- Custom paths via sandbox config
Approval Workflow
Enable safety confirmations:
infer config tools safety enable
LLMs request approval before executing Write/Edit/Delete/Bash operations with real-time diff preview.
Troubleshooting
Connection Issues
# Check configuration
infer config show
# Verify gateway status
infer status
# Debug mode
infer --debug chat
Permission Issues
# Check configuration directory
ls -la ~/.infer/
# Reset configuration
infer config reset
# Re-initialize
infer init
Tool Execution Problems
# Check tool status
infer config tools status
# Validate whitelist
infer config tools validate
# Enable debug logging
export INFER_LOGGING_DEBUG=true
infer agent "your task"
Computer Use Issues
# Verify display server
echo $DISPLAY # Linux/X11
# Check permissions (macOS)
# System Preferences > Security & Privacy > Accessibility
# Test screenshot
infer chat
> "Take a screenshot and describe what you see"
Command Reference
| Command | Description |
|---|---|
infer init | Initialize project configuration |
infer status | Check gateway health and resource usage |
infer chat | Interactive chat session (TUI) |
infer chat --web | Web-based terminal interface |
infer agent <task> | Autonomous task execution |
infer skills <subcommand> | Manage Agent Skills (list, install, uninstall) |
infer channels-manager | Start the remote messaging daemon (Channels) |
infer config <subcommand> | Configuration management |
infer agents <subcommand> | A2A agent management |
infer conversation <subcommand> | Conversation history management |
infer --version | Show version information |
infer --help | Display help information |
Support and Resources
- Repository: github.com/inference-gateway/cli
- Issues: GitHub Issues
- Releases: GitHub Releases
- Documentation: Full Configuration Reference
The CLI is actively developed with regular updates and new features. Check the repository for the latest releases and announcements.