Inference Gateway CLI
The Inference Gateway CLI (infer) is a powerful Go-based command-line tool providing comprehensive access to the Inference Gateway with interactive chat, autonomous agents, Computer Use tools, and development workflows.
Current Version: v0.109.0 (Breaking changes expected until stable)
Key Features
- Zero-Configuration Setup - Add API keys and start chatting
- Autonomous Agent Mode - Delegate complex tasks with iterative execution
- Computer Use Tools - GUI automation with screenshot, mouse, and keyboard control
- Rich Tool Integration - File operations, code search, web access, GitHub via the
ghCLI - Smart Safety System - Configurable approval workflow with diff visualization
- Beautiful TUI - Scrollable interface with syntax highlighting and multiple themes
- Web Terminal - Browser-based interface with tabbed sessions
- Remote Messaging Channels - Control the agent from Telegram and other platforms (Learn more)
- Agent Skills - Reusable, model-readable instruction folders loaded on demand, portable across vendors (Learn more)
- Cost Tracking - Real-time token usage and cost calculation
Installation
npm / npx (Recommended)
Run the CLI without installing anything (requires Node.js >= 18). The matching native binary is downloaded and cached on first use:
npx @inference-gateway/cli@latest --help
npx @inference-gateway/cli@latest chatOr install it globally:
npm install -g @inference-gateway/cli
infer --helpNot recommended for production - prefer the install script or building from source. Prebuilt binaries cover Linux and macOS on amd64/arm64 (on Windows, use WSL).
Install Script (Recommended)
# Latest version
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash
# Specific version
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash -s -- --version v0.97.0
# Custom directory
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash -s -- --install-dir $HOME/.local/binGo Install
go install github.com/inference-gateway/cli@latestManual Download
Download binaries from the GitHub releases page. Binaries are signed with Cosign for verification.
Build from Source
git clone https://github.com/inference-gateway/cli.git
cd cli
go build -o inferShell Completions
The CLI ships an infer completion subcommand (provided by fang) that generates completion scripts for bash, zsh, fish, and powershell. Enabling completions adds tab-completion for subcommands, flags, and many flag values.
# Zsh (current session)
source <(infer completion zsh)
# Zsh (persistent) - write to a directory on $fpath
infer completion zsh > "${fpath[1]}/_infer"
# Bash (current session)
source <(infer completion bash)
# Bash (persistent)
infer completion bash > /etc/bash_completion.d/infer
# Fish
infer completion fish > ~/.config/fish/completions/infer.fish
# PowerShell
infer completion powershell | Out-String | Invoke-ExpressionRun infer completion --help to list the supported shells. After writing a persistent completion file, start a new shell (or re-source your shell rc) for it to take effect. If completions do not appear, see Shell Completions Not Working.
Shipped in inference-gateway/cli#592.
Quick Start

# Initialize configuration
infer init
# Generate AGENTS.md documentation for AI agents (recommended for new projects)
infer chat
> /init
# Check gateway status
infer status
# Start interactive chat
infer chat
# Launch web terminal
infer chat --web
# Autonomous agent mode
infer agent "Analyze this codebase and suggest improvements"
# Get help (styled output)
infer --help
# Show version
infer --version
# Enable shell completions for the current shell (zsh example)
source <(infer completion zsh)Generating AGENTS.md
For new projects, use the /init shortcut to automatically generate an AGENTS.md file. This file provides structured documentation that helps AI agents understand your project:
infer chat
> /initThe agent will:
- Analyze your project structure with the Tree tool
- Examine configuration files, build systems, and documentation
- Generate comprehensive
AGENTS.mdincluding:- Project overview and technologies
- Architecture and structure
- Development environment setup
- Key commands (build, test, lint, run)
- Testing instructions
- Project conventions and coding standards
- Important files and configurations
This documentation helps other AI agents (and developers) quickly understand how to work with your project.
Help and Version Output
The CLI's help, error, and version output are rendered with fang, so every command produces styled, colorized output. The samples below are shown as plain text; in a real terminal the headings, flags, and errors are colorized.
Help
infer --help (and --help on any subcommand) prints a styled usage page grouped into usage, commands, and flags. Note that -v is --verbose; version is the long-form --version flag.
infer
A powerful command-line interface for managing and interacting with
the Inference Gateway.
USAGE
infer [command] [--flags]
COMMANDS
init Initialize project configuration
status Check gateway health and resource usage
chat Interactive chat session (TUI)
agent Autonomous task execution
config Configuration management
tools Run and inspect agent tools directly
completion Generate the autocompletion script for the specified shell
version Show version information
FLAGS
-h --help Show help
-v --verbose Verbose output
--version Print version informationErrors
Unknown commands and flags exit non-zero with a styled error message and no noisy usage dump (fang sets cobra's SilenceErrors/SilenceUsage and renders the error itself):
$ infer badcmd
Error: unknown command "badcmd" for "infer"
$ echo $?
1Version
infer --version prints the version, styled by fang:
$ infer --version
infer version v0.109.0The standalone version subcommand is kept for backwards compatibility and prints the same information:
infer versionThe manual
--versionboolean flag was replaced by fang's built-in version handling (fang.WithVersion) in inference-gateway/cli#592. Bothinfer --versionand theinfer versionsubcommand remain supported.
Core Commands
| Command | Description | Key Features |
|---|---|---|
infer init | Initialize project configuration | Creates .infer/config.yaml with defaults |
infer status | Check gateway health | Shows resource usage and connectivity |
infer chat | Interactive chat TUI | Streaming, scrolling, tool expansion, mode switching |
infer chat --web | Web-based terminal | Browser interface, tabbed sessions, remote access |
infer agent <task> | Autonomous task execution | Background operation, task planning, validation |
infer config <cmd> | Configuration management | Generic get/set for any config key |
infer tools <cmd> | Run agent tools directly | Execute a tool or validate a bash command |
Chat Interface Features
Navigation:
- Shift + Arrow Down/Up: Scroll chat history
- Ctrl+R: Toggle tool result expansion
- Shift+Tab: Cycle agent modes (Standard -> Plan -> Auto-Accept)
- Ctrl+K: Toggle model thinking blocks
Capabilities:
- Real-time streaming with syntax highlighting
- Mouse wheel and keyboard scrolling
- Model switching during conversation
- Tool result inspection
- Cost tracking in status bar
- Collapsible thinking blocks
- GitHub issue references - type
#to insert and expand#Ntokens (see below)
GitHub Issue References (#)
Type # in the chat input to open a dropdown of the current repository's open issues - each entry shows the issue number, title, and state. The list is resolved through the gh CLI from the repo's git remote, newest first. Selecting an issue inserts a highlighted #N token into your message.
On submit, every #N token is expanded inline into that issue's title, body, and most recent comments (up to the latest 20) before the message is sent to the model - so the agent works from full issue context without a redundant gh issue view lookup.
infer chat
> Summarize #123 and propose a fix
# "#123" expands into the issue title, body, and recent comments before sendingPrerequisite: the gh CLI must be installed and authenticated, and the working directory must be a git repository with a remote. The feature gracefully no-ops when gh is missing, the directory is not a git repo, the repo has no remote, or authentication has expired - the dropdown simply shows nothing.
Shipped in inference-gateway/cli#574.
Switching models (/model)
/model is the unified model command - it replaces the deprecated /switch:
/model <name>- permanently switch the active model for the rest of the session./model <name> <prompt...>- run a single message with<name>, then restore the session model afterward. Handy for sending one hard question to a stronger model without changing your default./model(no argument) - open the model picker.
infer chat
> /model deepseek/deepseek-v4-pro # switch the session model
> /model anthropic/claude-opus-4-8 Explain this stack trace # one-off, then restore
/switchis deprecated - use/model <name>. Shipped in inference-gateway/cli#618.
Diff viewer and git staging
When the agent proposes file changes (or you open a diff), the diff viewer supports patch-level staging - select individual lines, split hunks, and stage or unstage everything at once. All keys are configurable in .infer/keybindings.yaml (category diff_viewer); the defaults:
| Key | Action |
|---|---|
space / v | Start or clear a line-range selection within the current hunk |
a / u / enter | Apply (stage/unstage) the selected lines - or the whole hunk if none selected |
s | Split the current hunk into smaller, independently stageable blocks |
] / [ | Jump to the next / previous hunk |
A | Stage all changes (git add -A, including untracked files and deletions) |
U | Unstage all changes (git reset -q HEAD) |
Select a range with space/v, navigate, then apply to stage just those lines - or split a mixed hunk with s and stage each block separately. The footer hint reflects whether a selection is active, and hides discard when a staged file is selected (discard only applies to unstaged changes).
Shipped in inference-gateway/cli#618.
Agent Modes
Toggle between modes anytime during chat using Shift+Tab.
| Mode | Tools | Approval | Best For |
|---|---|---|---|
| Standard (Default) | All configured | Required for Write/Edit/Delete/Bash | General development, collaborative coding |
| Plan (Read-Only) | Read, Grep, Tree only | None | Code reviews, architecture analysis, planning |
| Auto-Accept (YOLO) | All configured | None - immediate execution | Trusted environments, rapid prototyping, automation |
Standard Mode
Full tool access with safety controls and approval prompts for sensitive operations.
infer chat
> "Refactor the authentication module to use environment variables"
# Agent analyzes code, proposes changes, requests approval before modifyingPlan Mode
Analysis and planning without execution. Safe exploration of unfamiliar codebases.
infer chat
# Press Shift+Tab to switch to Plan Mode
> "How should I implement user authentication with JWT tokens?"
# Agent explores code structure and provides detailed planAuto-Accept Mode
Zero approval prompts for maximum speed. Use with caution in version-controlled environments.
infer chat
# Press Shift+Tab twice to switch to Auto-Accept Mode
> "Run the test suite, fix all failing tests, and commit the changes"
# Agent executes everything immediatelyImportant for Auto-Accept: Ensure clean git working tree and backups.
Because the per-action approval gate is off in this mode, the agent runs under a dedicated destructive-action safety prompt (prompts.agent.system_prompt_auto). It is told to stop and confirm before irreversible operations - deletes, git push --force, git reset --hard, dropping databases, rm -rf, publishing or releasing - to prefer the reversible path when no user is reachable, and never to print or publish a secret value. It falls back to the standard system_prompt when left blank.
Headless Agent Stream Output
infer agent <task> runs the agent non-interactively and writes a newline-delimited JSON (JSONL) stream to stdout. Each line is one JSON object with a type discriminator, intended for programmatic consumers such as the infer-action GitHub Action. The stream is additive: new type values may be introduced over time and consumers should ignore any type they do not recognize.
infer agent "Refactor the authentication module"Secure by default. A headless run executes in standard mode, so off-list or mutating actions are not auto-run - they are blocked (when no approver is reachable) or sent for IPC approval (under a channel manager). See Headless secure-by-default to opt into more autonomy.
Session stats summary line
When a session completes, the CLI emits a single session_stats line summarizing token usage and computed dollar cost for the run. This lets consumers report real run cost without re-implementing the per-model pricing table.
{
"type": "session_stats",
"message": "Session complete",
"timestamp": "2026-05-29T17:48:55+02:00",
"model": "deepseek/deepseek-v4-flash",
"prompt_tokens": 21000,
"completion_tokens": 1260,
"total_tokens": 22260,
"requests": 7,
"cost": { "input": 0.0021, "output": 0.0008, "total": 0.0029, "currency": "USD" }
}Fields:
| Field | Type | Description |
|---|---|---|
type | string | Always session_stats for this line. |
message | string | Human-readable status, currently Session complete. |
timestamp | string | RFC 3339 timestamp at which the line was emitted. |
model | string | Model used for the run. A single model is attributed per run. |
prompt_tokens | number | Sum of input tokens across all requests in the run. |
completion_tokens | number | Sum of output tokens across all requests in the run. |
total_tokens | number | prompt_tokens + completion_tokens. |
requests | number | Number of LLM requests (turns that reported usage) in the run. |
cost | object | Computed dollar cost for the run - see Cost object. |
Cost object
| Field | Type | Description |
|---|---|---|
input | number | Cost attributed to prompt_tokens using the configured pricing table. |
output | number | Cost attributed to completion_tokens using the configured pricing table. |
total | number | input + output. |
currency | string | ISO 4217 currency code from pricing.currency. Defaults to USD. |
When pricing.enabled: false (or pricing data is unavailable for the model), input, output, and total are all 0 while currency is still populated. The cost object is always present, giving consumers a stable schema.
Behavior notes:
- The line is additive - it does not replace any existing stream output.
- It is emitted once per run, at session completion (including on early errors).
- It is always emitted in
agentmode - there is no flag to enable or disable it. - Cost is attributed to a single model per run.
- Consumers should ignore unknown
typevalues to remain forward-compatible.
Computer Use
GUI automation and visual understanding capabilities for interacting with applications and desktop environments.
Display Server Support
Automatic display server detection - no configuration needed:
| Platform | Supported Servers | Notes |
|---|---|---|
| macOS | Quartz (native), X11 (XQuartz) | Quartz automatically detected and used |
| Linux | X11, Wayland | Auto-detection handles both protocols |
Display server type is automatically detected at runtime. No manual configuration required.
Computer Use Tools
| Tool | Description | Key Capabilities |
|---|---|---|
| GetLatestScreenshot | Capture screen regions | Streaming mode, region selection, circular buffer, JPEG format (configurable quality) |
| MouseMove | Control cursor position | Absolute coordinates, relative movement |
| MouseClick | Perform click actions | Left/right/middle clicks, double-click support |
| MouseScroll | Scroll content | Vertical and horizontal scrolling |
| KeyboardType | Type text and keys | Plain text, key combinations (Ctrl+C, Cmd+V), configurable typing delay |
| GetFocusedApp | Identify active app | Returns focused application name |
| ActivateApp | Switch applications | Focus and activate specific apps |
Screenshot Tool Features
Streaming Mode:
- Maintains circular buffer of recent screenshots
- Configurable buffer size (default: 5)
- Configurable capture interval (default: 3 seconds)
- Efficient memory management
- Fast access to recent captures
Image Optimization:
- Automatic resolution scaling (max: 1920x1080, target: 1024x768)
- JPEG compression with configurable quality (default: 85%)
- Reduces bandwidth and storage requirements
- Optional capture overlay for debugging
Region Selection:
- Full screen capture
- Custom region coordinates (x, y, width, height)
- Multiple monitor support
Floating Window
Real-time visualization of agent activity:
computer_use:
floating_window:
enabled: true
respawn_on_close: true # Auto-restart if closed
position: top-right # top-left, top-right, bottom-left, bottom-right
always_on_top: true # Keep window above other appsFeatures:
- Always-on-top overlay
- Shows agent actions in real-time
- Configurable position
- Auto-respawn option if accidentally closed
- Non-intrusive design
- Available on all platforms with GUI support
Computer Use Configuration
computer_use:
enabled: true
floating_window:
enabled: true
respawn_on_close: true
position: top-right
always_on_top: true
screenshot:
enabled: true
max_width: 1920
max_height: 1080
target_width: 1024
target_height: 768
format: jpeg
quality: 85
streaming_enabled: true
capture_interval: 3
buffer_size: 5
temp_dir: ''
log_captures: false
show_overlay: true
rate_limit:
enabled: true
max_actions_per_minute: 60
window_seconds: 60
tools:
mouse_move:
enabled: true
mouse_click:
enabled: true
mouse_scroll:
enabled: true
keyboard_type:
enabled: true
max_text_length: 1000
typing_delay_ms: 100
get_focused_app:
enabled: true
activate_app:
enabled: trueSafety and Rate Limiting
Rate Limiting:
- Default: 60 actions per minute
- Prevents runaway automation
- Configurable threshold
Safety Controls:
- Approval prompts in Standard Mode
- Auto-approve in YOLO mode
- Activity logging for audit trails
- Command execution monitoring
Best Practices:
- Use Standard Mode for initial exploration
- Enable logging for debugging
- Set appropriate rate limits
- Monitor activity logs
- Test in safe environments first
Example Use Cases
infer chat
> "Take a screenshot and analyze the error dialog"
> "Click the Submit button in the center of the screen"
> "Type 'Hello World' and press Enter"
> "Switch to the Terminal app and run ls command"
> "Find the Save button and click it"Tools & Capabilities
When tools are enabled, LLMs have access to a comprehensive suite across multiple categories.
Tool Categories
| Category | Tools | Description |
|---|---|---|
| File System | Read, Write, Edit, MultiEdit, Delete, Tree, Grep | File operations and search with safety controls |
| Command Execution | Bash, BashOutput, KillShell, ListShells | Allow-listed shell execution (including gh for GitHub) and background shell control |
| Web | WebSearch, WebFetch | Internet research and content fetching |
| Workflow | TodoWrite, Schedule, RequestPlanApproval | Task tracking, cron jobs, plan-mode approval |
| A2A Integration | A2A_QueryAgent, A2A_SubmitTask, A2A_QueryTask | Delegate to specialized agents - see A2A |
| Computer Use | GetLatestScreenshot, MouseMove, MouseClick, MouseScroll, KeyboardType, GetFocusedApp, ActivateApp | GUI automation - see the Computer Use section above |
| MCP | MCP_<server>_<tool> | Dynamically registered tools from MCP servers - see MCP |
File System Tools
Read
Read a file from the local filesystem with an optional line range. Handles text files and PDFs.
- Parameters:
file_path(required, absolute or relative),limit(default 2000 lines),offset(default 1) - Approval: not required (read-only)
- Notes: lines longer than 2000 characters are truncated; output is returned in
cat -nformat
Write
Write content to a file on disk. Overwrites the existing file at the given path.
- Parameters:
file_path(required, absolute),content(required) - Approval: required by default
- Notes: if the file exists, the Read tool must have been used first; respects configured path exclusions (
.git/,*.env,.infer/)
Edit
Perform an exact string replacement in a single file.
- Parameters:
file_path(required),old_string(required - must match exactly and be unique unlessreplace_allis set),new_string(required - must differ fromold_string),replace_all(defaultfalse) - Approval: required by default
- Notes: the file must have been Read at least once in the conversation; indentation must be preserved exactly
MultiEdit
Apply a sequence of edits to a single file atomically - either all succeed or none are applied.
- Parameters:
file_path(required),edits(required array; each item hasold_string,new_string, optionalreplace_all) - Approval: required by default
- Notes: edits are applied in order, each operating on the result of the previous one - plan them so earlier edits don't invalidate later matches
Delete
Delete a file or directory. Wildcards are supported when enabled.
- Parameters:
path(required - supports patterns like*.txtortemp/*),recursive(defaultfalse),force(defaultfalse),format(textorjson) - Approval: required by default
- Notes: restricted to the current working directory for safety
Tree
Display a directory tree, similar to the Unix tree command.
- Parameters:
path(default.),max_depth(1-10, default 3),max_files(1-1000, default 100),respect_gitignore(defaulttrue),show_hidden(defaultfalse),format(textorjson) - Approval: not required
- Notes: uses the system
treebinary when available, otherwise falls back to a built-in implementation
Grep
Powerful regex search across files. Uses ripgrep when available, otherwise a built-in Go implementation.
- Parameters:
pattern(required regex),path(default cwd),glob(e.g.*.ts,**/*.tsx),type(e.g.go,py,rust),output_mode(content|files_with_matches|count, defaultfiles_with_matches),-i,-n,-A,-B,-C,multiline,head_limit - Approval: not required
- Backend: configurable via
tools.grep.backend(auto|ripgrep|go) - Notes: respects
.gitignore; auto-excludes.git,node_modules,.infer,vendor,dist,build,target
Command Execution
Bash
Execute a bash command that matches the active mode's allowed-list. Matching is default-deny: a command auto-runs only when it matches the allowed-list for the current agent mode. Anything unmatched falls through to an approval prompt in chat, or is rejected with an actionable hint in headless infer agent. There is no separate deny list.
- Parameters:
command(required),format(textorjson) - Approval: configurable via
tools.bash.require_approval
Per-mode allowed-list
The allowed-list is configured per agent mode under tools.bash.mode.<mode>.allow. The effective list for a mode is mode.all.allow (the every-mode baseline) unioned with that mode's own entries:
tools:
bash:
enabled: true
require_approval: false
mode:
all: # baseline applied in every mode
allow:
- ls( .*)?
- pwd( .*)?
- git status( .*)?
- git diff( .*)?
plan: # read-only analysis - usually adds nothing
allow: []
standard: # default interactive mode
allow:
- npm (install|test|run).*
auto: # Auto-Accept / YOLO mode
allow:
- .* # unrestricted sentinel- Default-deny. Out of the box only
mode.autoships the.*sentinel;mode.planandmode.standardadd nothing on top ofmode.all, so they reduce to the read-only baseline. GitHub writes (gh issue/pr create|edit|comment),git push, andgit commitare not in the defaults - they fall through to approval until you add them. - Full-command matching. Each entry matches the whole command, so a bare token like
ghallows onlygh- nevergh issue list. Opt into arguments explicitly with a pattern (gh issue.*,npm (install|test|run).*); the default entries use a( .*)?suffix to allow trailing arguments. - The
.*sentinel means unrestricted: any single command runs and the clean-command guard below is skipped. It is the default formode.auto(chat's Auto-Accept mode, toggled with Shift+Tab) and is an explicit opt-in - never a headless default.
Clean-command guard
For every mode except the .* sentinel, each command passes a clean-command guard before the allowed-list is consulted. The guard rejects, regardless of the list:
- Command substitution -
$(...), backticks,<(...),>(...). - Multi-command chains and pipelines - a top-level
|,&&,||,;,&, or newline. Operators inside quotes don't count, sojq '.a | .b'stays a single command. (This closes the oldecho x | xargs rmprefix hole.) - File-write redirections -
>and>>. Benign stream redirects (2>&1,>/dev/null) are stripped first and remain allowed. - Dangerous
findactions --exec,-delete, and the like. A barefindfor read-only discovery is fine. - Environment-variable leaks - a printing or publishing command (
echo,printf,gh issue|pr create|comment|edit) may not expand$VAR. Soecho $AWS_SECRET_ACCESS_KEYis blocked, whilels $DIRstays allowed. A single-quoted or escaped$is treated literally.
A rejected command returns an actionable hint naming what tripped the guard; the model is told to stop and ask, or use an allowed alternative, rather than retry the same call.
Append-only override (CI)
The mode.all baseline takes an append-only override so CI can add a few safe commands without rewriting config or shipping .*:
# Comma- or newline-separated; the env var wins over the flag
export INFER_TOOLS_BASH_ALLOW_APPEND="git commit,git push"
# Flag form
infer agent "Release the changelog" --tools-bash-allow-append "git commit,git push"The extra commands merge onto mode.all.allow, so they auto-run in every mode. There is no replace override - the old tools.bash.whitelist.commands key, the INFER_TOOLS_BASH_WHITELIST_COMMANDS[_APPEND] env vars, and the --tools-bash-whitelist-commands* flags were removed in inference-gateway/cli#618.
BashOutput, KillShell, ListShells
Background-shell management. These tools are only registered when tools.bash.background_shells.enabled: true.
- BashOutput -
bash_id(required),filter(optional regex). Returns only new output since the last read. - KillShell -
shell_id(required). Sends SIGTERM, then SIGKILL after 5 seconds if the shell doesn't exit. - ListShells - no parameters. Lists all running and recently completed background shells with their IDs, state, and elapsed time.
Web Tools
WebSearch
Search the web via DuckDuckGo or Google.
- Parameters:
query(required),engine(duckduckgo|google, defaults to the configured engine),limit(1-50, defaults to configuredmax_results),format(textorjson)
tools:
web_search:
enabled: true
default_engine: duckduckgo
max_results: 10
engines: [duckduckgo, google]
timeout: 10WebFetch
Fetch content from an allowed URL. Optionally save the response to disk.
- Parameters:
url(required),format(textorjson),download(defaultfalse- whentrue, saves under~/.infer/tmp) - Notes: only allowed domains can be fetched; responses are cached (default 15-minute TTL)
tools:
web_fetch:
enabled: true
allowed_domains:
- golang.org
- github.com
safety:
max_size: 8192
timeout: 30
cache:
enabled: true
ttl: 3600GitHub Operations
There is no built-in GitHub tool. The agent performs all GitHub work - issues, pull requests, releases, repository metadata, and the raw API - through the gh CLI run via the Bash tool.
# Issues and pull requests
gh issue view 123
gh issue list --state open
gh pr create --title "fix: handle nil channel" --body "Closes #123"
gh pr diff 456
# Raw API (read-only / GET)
gh api repos/inference-gateway/cli/issues
gh api user --jq .loginRequirements: gh must be installed and authenticated. It uses the standard gh credential chain - run gh auth login, or set GITHUB_TOKEN (or GH_TOKEN). No separate token configuration exists anymore.
Default gh allowed-list
GitHub operations run through Bash, so they obey the Bash allowed-list. The default mode.all baseline auto-approves common read-only gh commands only:
| Auto-approved by default | Examples |
|---|---|
| Read-only reads | gh issue list, gh pr view 5, gh pr diff, gh repo view, gh release view v1 |
| Auth status | gh auth status |
| Search | gh search issues kind:bug, gh search code "func main" |
| Read-only project boards | gh project list, gh project view 3, gh project item-list 3 |
GitHub writes and destructive operations are deliberately left off the defaults. They are not auto-approved - they fall through to the standard approval prompt (in chat) or are blocked (in headless infer agent) until you add them to an allowed-list:
- Issue / PR writes -
gh issue create|edit|comment,gh pr create. - Project writes -
gh project item-add|item-edit. - Destructive -
gh pr merge,gh pr close,gh issue delete,gh repo delete,gh release create,gh run cancel,gh auth login. - Raw
gh api- any call. The previous GET-wildcard auto-approval was dropped; a raw-API need is now opt-in per repo.
Hardened in inference-gateway/cli#618. Earlier defaults auto-approved
gh issue/prwrites and read-onlygh api. They now require approval - add the specific commands you trust to an allowed-list, or use the append override.
The shipped mode.all baseline:
tools:
bash:
mode:
all:
allow:
- gh (issue|pr|repo|release|run|workflow) (list|view|status|diff|checks)( .*)?
- gh auth status( .*)?
- gh search (issues|code|prs|repos|commits)( .*)?
- gh project (list|view|item-list|field-list)( .*)?Migration: the built-in GitHub tool was removed
Breaking change. The built-in
Githubtool was removed in favor of theghCLI (inference-gateway/cli#572). Thetools.githubconfig block and theinfer config tools githubcommands no longer exist. Existing configs that still contain atools.githubsection are ignored - unknown keys are dropped, so they do not error and need no manual cleanup. Replace any scripted use of the old tool with the matchingghcommand (for examplegh issue view,gh pr create,gh api).
Workflow Tools
TodoWrite
Create and update a structured task list for the current session. Use for complex multi-step work to track progress and surface intent to the user.
- Parameters:
todos(required array; each item hascontent,status∈pending|in_progress|completed, and optionalid) - Approval: not required
- Best practice: keep at most one task in
in_progressat a time; mark itemscompletedimmediately on finishing
Schedule
Create, list, get, update, or delete cron jobs that fire through the same messaging channel that started the session (e.g. Telegram). Jobs are persisted as YAML under ~/.infer/schedules/ and executed by the infer channels-manager daemon (which hot-reloads via fsnotify).
- Parameters:
operation(required:create|list|get|update|delete),job_id(required for get/update/delete),cron_expression(5-field crontab or@every <duration>),prompt,run_once(defaultfalse- whentrue, the job is deleted after firing once),name,description,model(optional model override) - Approval: required by default
- Notes: each fire creates a brand-new agent session - no context is carried between runs; only usable from a channel-driven session
"0 8 * * *" every day at 08:00
"*/15 * * * *" every 15 minutes
"0 9 * * 1-5" weekdays at 09:00
"@every 1h" every hourRequestPlanApproval
Submit a completed plan for user approval. Available only in Plan Mode.
- Parameters:
plan(required - the complete, detailed plan text) - Behavior: pauses execution until the user approves (switches to execution mode) or rejects (provides feedback)
Security Features
- Command allow-listing: Default-deny, per-mode allowed-list for the Bash tool
- Approval Prompts: Safety confirmations for Write/Edit/Delete/Bash
- Path Protection: Sensitive directories automatically excluded (
.git/,*.env,.infer/) - Sandbox Controls: Restrict tool operations to allowed directories
- Domain allow-listing: Control web fetch access
- Diff Preview: Colored, syntax-aware diff before file modifications
Tool Configuration
Tool settings are read and written with the generic config commands - there are no per-setting subcommands.
# Enable/disable all tool execution for LLMs
infer config set tools.enabled true
infer config set tools.enabled false
# Enable/disable an individual tool (for example bash)
infer config set tools.bash.enabled true
# Require approval before any tool runs
infer config set tools.safety.require_approval true
# Require approval for a specific tool only (for example bash)
infer config set tools.bash.require_approval true
# Sandbox directories - comma-separated; the whole list is replaced
infer config set tools.sandbox.directories ".,/protected/path"
# Inspect the resulting tools config
infer config get toolsRunning Tools Directly
Run any enabled tool outside a chat session, or check whether a bash command would pass the allowed-list, with the top-level infer tools command.
# Execute a tool by name with JSON arguments (tool names are case-insensitive)
infer tools execute Read '{"file_path":"README.md"}'
infer tools execute grep '{"pattern":"func main","path":"."}'
# Validate whether a bash command is allowed (without running it)
infer tools validate "git status"infer tools execute <tool> [json-args] resolves tool names case-insensitively in the CLI - the agent itself still uses the exact PascalCase names. infer tools validate <command> reports whether a bash command would be permitted by the configured allowed-list, without executing it.
infer tools executeandinfer tools validatemoved fromconfig tools exec/config tools validateto the top-levelinfer toolscommand in inference-gateway/cli#601.
Configuration
Two-layer configuration system with precedence from highest to lowest:
Configuration Precedence
| Priority | Source | Example |
|---|---|---|
| 1 (Highest) | Environment Variables | INFER_GATEWAY_URL, INFER_AGENT_MODEL |
| 2 | Command Line Flags | --model, --debug |
| 3 | Project Config | .infer/config.yaml |
| 4 | User Config | ~/.infer/config.yaml |
| 5 (Lowest) | Built-in Defaults | Internal defaults |
Key Configuration Areas
Gateway Settings:
- Gateway URL and API key
- Timeout and retry configuration
- OCI image for auto-running gateway
- Model filtering (include/exclude lists)
Agent Configuration:
- Default model for operations
- System prompts (main and plan mode)
- System reminders interval
- Max turns and tokens
- Parallel tool execution (default: 5 concurrent)
Tool Settings:
- Enable/disable individual tools
- Approval requirements per tool (whether) and delivery via
tools.safety.approval_behaviour(how) - Per-mode bash allowed-lists (
tools.bash.mode.<mode>.allow) - Sandbox directories
- Protected paths
Storage Backends:
- SQLite (default) - local file storage
- PostgreSQL - shared database for teams
- Redis - high-performance caching
- In-memory - temporary sessions
Conversation Features:
- Automatic history with search
- AI-generated titles
- Token optimization and compaction
- Export/import capabilities
Essential Environment Variables
export INFER_GATEWAY_URL="http://localhost:8080"
export INFER_GATEWAY_API_KEY="your-api-key"
export INFER_AGENT_MODEL="deepseek/deepseek-v4-flash"
export INFER_LOGGING_DEBUG="true"
export GITHUB_TOKEN="your-github-token" # used by the gh CLI credential chain for GitHub operations
# Append a few commands onto the bash allowed-list baseline (comma- or newline-separated)
export INFER_TOOLS_BASH_ALLOW_APPEND="git commit,git push"
# How a needed approval is delivered: prompt | ipc | block
export INFER_TOOLS_SAFETY_APPROVAL_BEHAVIOUR="prompt"Configuration Commands
Configuration uses a generic key/value interface. infer config get reads the effective value of any key; infer config set writes one to config.yaml. Keys are dotted paths into the config (for example agent.model, tools.bash.enabled).
# Initialize configuration
infer config init
# Print the whole effective config (defaults + ~/.infer + .infer + INFER_* env)
infer config get
# Print a single key
infer config get agent.model
# Print as JSON instead of YAML
infer config get --format json
# Set a value - parsed to the field's type (bool, integer, number, or string)
infer config set agent.model deepseek/deepseek-v4-flash
infer config set agent.max_turns 50
infer config set agent.verbose_tools true
# List-valued keys take a comma-separated value (the whole list is replaced)
infer config set tools.sandbox.directories ".,/work/project"
infer config set tools.web_fetch.allowed_domains "golang.org,github.com"
# Target the user-global ~/.infer/config.yaml instead of the project .infer/config.yaml
infer config set agent.model deepseek/deepseek-v4-flash --userspace
# Recreate config.yaml from defaults
infer config init --overwriteSystem prompts are not set via
config set- they live inprompts.yaml(for exampleprompts.agent.system_prompt) and are edited there.
Command Mapping
The per-setting subcommands were removed in inference-gateway/cli#601 in favor of config get/config set and the top-level infer tools command:
| Old command | New command |
|---|---|
config agent set-model X | config set agent.model X |
config agent set-max-turns N | config set agent.max_turns N |
config agent verbose-tools enable | config set agent.verbose_tools true |
config agent skills enable | config set agent.skills.enabled true |
config tools enable | config set tools.enabled true |
config tools bash enable | config set tools.bash.enabled true |
config tools safety enable | config set tools.safety.require_approval true |
config tools safety set bash enabled | config set tools.bash.require_approval true |
config tools sandbox add DIR | config set tools.sandbox.directories ".,DIR" |
config tools grep set-backend rg | config set tools.grep.backend ripgrep |
config tools web-fetch add-domain D | config set tools.web_fetch.allowed_domains "D" |
config export set-model X | config set export.summary_model X |
config show | config get |
config tools exec <tool> | tools execute <tool> |
config tools validate <cmd> | tools validate <cmd> |
See the full configuration reference for detailed options.
Shortcuts
The CLI provides built-in shortcuts and supports custom user-defined shortcuts.
Built-in Shortcuts
| Shortcut | Description | Example |
|---|---|---|
/init | Generate AGENTS.md documentation | /init |
/init-github-action | Setup GitHub Action integration | /init-github-action |
/git <cmd> | Git operations | /git status, /git commit, /git push |
/scm <cmd> | GitHub operations | /scm pr-create, /scm issue view 123 |
/model [name] [msg] | Switch the active model, or run one message with another model (replaces /switch) | /model deepseek/deepseek-v4-pro |
/a2a | View connected A2A agents | /a2a |
/skills <cmd> | Manage Agent Skills | /skills list, /skills install <url> |
/voice [seconds] | Record the mic and transcribe to the input field (requires speech-to-text) | /voice, /voice 8 |
Git Shortcuts
# Execute git commands
/git status
/git branch
# AI-generated commit message
/git commit
# Push to remote
/git push origin mainSCM (GitHub) Shortcuts
# List GitHub issues
/scm issues
# View issue details
/scm issue 123
# Create pull request with AI-powered plan
/scm pr-createVoice Shortcut
The /voice shortcut records audio from your microphone, transcribes it locally with whisper.cpp, and places the text into the input field - ready to review and send. It is disabled by default and only appears when speech_to_text.enabled is true.
# Record until you go quiet (or the max cap), then transcribe
/voice
# Record for at most 8 seconds
/voice 8Recording stops automatically a couple of seconds after you stop speaking (speech_to_text.silence_timeout), at the max_recording_seconds cap, or at the per-call override. See Speech-to-Text for prerequisites, configuration, and model selection.
GitHub Action Setup
The /init-github-action shortcut launches an interactive wizard for setting up AI-powered issue automation using GitHub Apps and the infer-action GitHub Action. This wizard streamlines the process of creating GitHub Apps, managing credentials, configuring repository secrets, and generating workflows that respond to issue mentions with @infer.
For a full reference of
infer-actioninputs, outputs, and workflow recipes (PR review, scheduled summaries, release notes), see the GitHub Action documentation.
Key Features:
- Interactive wizard for creating or configuring GitHub Apps
- Supports both personal and organization repositories
- Automatic workflow file generation in
.github/workflows/ - Private key management with interactive file picker
- GitHub App reusability across multiple repositories
- Auto-opens browser with pre-filled app creation forms
- Multi-step guided setup process
Prerequisites:
- GitHub account with repository access
- Admin permissions for creating GitHub Apps (required for organization repositories)
- Downloaded private key file (
.pem) from GitHub (after app creation)
Usage:
infer chat
> /init-github-actionWizard Flow:
- Check Existing Configuration: Detects if a GitHub App is already configured
- App ID Input: Enter existing App ID or create a new GitHub App
- Private Key Selection: Interactive file picker to select your
.pemprivate key file - Repository Configuration: Configure repository secrets and permissions
- Workflow Creation: Automatically generates GitHub Action workflow files
Creating a New GitHub App:
When creating a new app, the wizard opens GitHub with pre-configured settings:
- App Name:
infer-bot(customizable) - Required Permissions:
- Contents: Write access
- Pull Requests: Write access
- Issues: Write access
- Metadata: Read access
- Webhooks: Disabled by default (can be enabled later if needed)
Steps for First-Time Setup:
- Run
/init-github-actionin chat mode - Choose to create a new GitHub App
- Browser opens with pre-filled GitHub App creation form
- Complete the app creation on GitHub
- Download the private key (
.pemfile) from GitHub - Return to CLI and enter the App ID shown on GitHub
- Use the file picker to select your downloaded
.pemfile - Wizard creates workflow files in
.github/workflows/
Reusing GitHub Apps:
The same GitHub App can be reused across multiple repositories:
cd another-project
infer chat
> /init-github-action
# Enter the same App ID and use the same private key fileGenerated Workflow Files:
The wizard creates GitHub Action workflows in .github/workflows/infer.yml that:
- Trigger on issue events (opened, edited) and issue comments
- Generate GitHub App tokens for authentication
- Execute AI-powered agents via the
@infermention trigger - Support multiple LLM providers (OpenAI, Anthropic, DeepSeek, etc.)
- Provide full repository access (issues, contents, pull requests)
Example Generated Workflow:
name: Infer
on:
issues:
types:
- opened
- edited
issue_comment:
types:
- created
permissions:
issues: write
contents: write
pull-requests: write
jobs:
infer:
runs-on: ubuntu-24.04
steps:
- name: Generate GitHub App Token
id: generate-token
uses: actions/[email protected]
with:
app-id: ${{ secrets.INFER_APP_ID }}
private-key: ${{ secrets.INFER_APP_PRIVATE_KEY }}
owner: ${{ github.repository_owner }}
- name: Checkout Repository
uses: actions/[email protected]
with:
token: ${{ steps.generate-token.outputs.token }}
- name: Run Infer Agent
uses: inference-gateway/[email protected]
with:
github-token: ${{ steps.generate-token.outputs.token }}
trigger-phrase: '@infer'
model: 'deepseek/deepseek-v4-pro'
max-turns: 50
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
openai-api-key: ${{ secrets.OPENAI_API_KEY }}
google-api-key: ${{ secrets.GOOGLE_API_KEY }}
deepseek-api-key: ${{ secrets.DEEPSEEK_API_KEY }}Repository Secrets Configuration:
After running the wizard, configure these secrets in your GitHub repository settings:
INFER_APP_ID- Your GitHub App IDINFER_APP_PRIVATE_KEY- Your GitHub App private key (.pem file contents)- Provider API keys (
ANTHROPIC_API_KEY,OPENAI_API_KEY, etc.)
Usage in Issues:
Once configured, mention @infer in any issue or issue comment to activate the agent:
@infer Please analyze this bug and suggest a fixFor more information on the infer-action GitHub Action, see the GitHub Action documentation or the upstream repository.
Custom Shortcuts
Create YAML files in .infer/shortcuts/ directory. Shortcuts support three types:
1. Simple Commands
Execute a single command:
# .infer/shortcuts/simple.yaml
shortcuts:
- name: hello
description: 'Say hello'
command: echo
args:
- 'Hello from Inference Gateway!'2. Shortcuts with Subcommands
Group related commands under a parent shortcut:
# .infer/shortcuts/dev.yaml
shortcuts:
- name: dev
description: 'Development operations'
command: bash
subcommands:
- name: test
description: 'Run all tests'
args:
- -c
- 'go test ./...'
- name: build
description: 'Build the project'
args:
- -c
- 'go build -o app .'Usage: /dev test, /dev build
3. AI-Powered Snippets
Use LLM to generate dynamic content based on command output. The snippet.prompt can reference JSON fields from command output using {fieldName} placeholders, and snippet.template uses {llm} for the AI-generated response:
# .infer/shortcuts/ai-commit.yaml
shortcuts:
- name: ai-commit
description: 'AI-generated commit message'
command: bash
args:
- -c
- |
diff=$(git diff --cached)
jq -n --arg diff "$diff" '{"diff": $diff}'
snippet:
prompt: "Generate commit message for:\n{diff}"
template: '!git commit -m "{llm}"'The command must output JSON. Fields are accessible in the prompt template via {fieldName} syntax. The LLM response is accessible via {llm} in the template.
Advanced Features
Cost Tracking
Real-time token usage and cost calculation displayed in the status bar.
Features:
- Per-model pricing calculation
- Cumulative session costs
- Input and output token tracking
- Status bar indicator
- Custom pricing support
View Costs:
# Costs displayed in status bar during chat
infer chat
# Status bar shows model and current cost
# Inspect a saved conversation's entries (per-entry metadata, including model)
infer conversations show <session-id>
# Same, as one JSON object per line for piping into jq
infer conversations show <session-id> --format json | jq .Pricing Configuration
Pricing lives under the pricing key in .infer/config.yaml. The custom_prices map overrides or adds entries to the built-in per-model pricing table, keyed by model name.
# .infer/config.yaml
pricing:
enabled: true
currency: 'USD'
custom_prices:
'ollama_cloud/deepseek-v4-pro':
input_price_per_mtoken: 0.0
output_price_per_mtoken: 0.0
requires_pro: true| Field | Type | Description |
|---|---|---|
input_price_per_mtoken | number | Cost per 1M prompt (input) tokens, in currency. |
output_price_per_mtoken | number | Cost per 1M completion (output) tokens, in currency. |
requires_pro | boolean | Marks the model as gated behind a paid Pro subscription. Defaults to false. |
Override caveat: a
custom_pricesentry fully replaces the default for that model - it is not merged field by field. Omittingrequires_proin a custom override therefore resets it tofalse, even when the model is flagged Pro by default. Setrequires_pro: trueexplicitly when overriding the pricing of a Pro model.
Model Categories (Free / Paid / Pro)
The model picker shows filter tabs - [1] All, [2] Free, [3] Paid, [4] Pro - and groups models into three disjoint categories:
| Category | Meaning |
|---|---|
| Free | No per-token cost and not gated behind a Pro subscription. |
| Paid | Billed per token. |
| Pro | Gated behind a paid Pro subscription (requires_pro: true). |
Pro is an axis orthogonal to price: an Ollama Cloud Pro model has no per-token cost but is not free, so it is labelled pro subscription rather than free. The marker appears both in the picker rows and in /model autocomplete descriptions:
ollama_cloud/deepseek-v4-pro (1M, pro subscription)
ollama_cloud/deepseek-v4-flash (1M, pro subscription)
ollama_cloud/deepseek-v3.2 (128K, free)
deepseek/deepseek-v4-pro (1M, $1.74/$3.48 per MTok)ollama_cloud/deepseek-v4-pro and ollama_cloud/deepseek-v4-flash are flagged Pro by default. This default Pro set is maintainer-curated (Ollama publishes no stable per-model tier badge) and fully overridable through custom_prices - set requires_pro: true to gate additional models, or override a default Pro model as shown above.
Model Thinking Visualization
Collapsible thinking blocks for models that support thinking (Claude, o1, etc.).
Features:
- Collapsible blocks with first sentence preview
- Ctrl+K keyboard shortcut to toggle
- Theme-aware styling
- Performance optimization (long thinking blocks collapsed by default)
Usage:
infer chat
# Ask complex question requiring reasoning
> "Design a scalable microservices architecture for e-commerce"
# Model's thinking process displayed in collapsible blocks
# Press Ctrl+K to expand/collapse thinkingConversation Management
Storage Backends:
- SQLite (default):
.infer/conversations.db - PostgreSQL: Shared team database
- Redis: High-performance caching
- JSONL: Append-only files under
.infer/conversations/ - In-memory: Temporary sessions
Features:
- Automatic conversation history
- AI-generated titles (batch: 10 messages)
- Token optimization with compaction
- Backend-agnostic inspection via the storage layer (works the same across
jsonl,sqlite,postgres,redis, andmemory)
Subcommands:
list: List saved conversations with metadata (id, title, message/request counts, tokens, cost).show <session-id>: Print a single conversation's entries in chronological order (role, timestamp, content, andtool_call_idfor tool results).
show flags:
--include-hidden: Include entries persisted as hidden - system reminders, plan-approval prompts, drained background-task results, and the synthetic verify message injected byinfer agent. Off by default.--format text|json:text(default) is human-readable;jsonemits one JSON object per line (NDJSON), matching theinfer agentstdout shape for piping intojqor log scrapers.
Session id resolution:
<session-id> is resolved the same way as infer agent --session-id: a literal UUID is used as-is, while any other value is treated as a session group key and resolved to that group's current session id (registering the group if it is new). This means you can show a conversation by group name such as channel-telegram-12345.
Commands:
# List conversations to find a session id
infer conversations list
# Show a conversation's entries (hidden entries omitted by default)
infer conversations show 12345678-1234-1234-1234-123456789abc
# Show by session group name (for example a channel group key)
infer conversations show channel-telegram-12345
# Include hidden entries such as system reminders
infer conversations show <session-id> --include-hidden
# One JSON object per line for piping into jq
infer conversations show <session-id> --format json | jq .MCP Integration
Connect to Model Context Protocol servers for extended capabilities. MCP provides stateless tool execution for external services like databases, file systems, and APIs.
Setup:
Initialize project to create .infer/mcp.yaml:
infer initConfigure MCP servers in .infer/mcp.yaml:
enabled: true
connection_timeout: 30
discovery_timeout: 30
liveness_probe_enabled: true
liveness_probe_interval: 10
servers:
# Auto-start MCP server in container (recommended)
- name: 'demo-server'
enabled: true
run: true
oci: 'mcp-demo-server:latest'
description: 'Demo MCP server'
# Connect to external MCP server
- name: 'filesystem'
url: 'http://localhost:3000/sse'
enabled: true
description: 'File system operations'
exclude_tools:
- 'delete_file'CLI Commands:
# Add auto-start MCP server
infer mcp add my-server --run --oci=my-mcp:latest
# List MCP servers
infer mcp list
# Toggle server
infer mcp toggle my-server
# Remove server
infer mcp remove my-serverUsing MCP Tools:
MCP tools appear as MCP_<server>_<tool> in chat. Example:
infer chat
> "Use the MCP_demo-server_get_time tool to get current time"See MCP documentation for detailed integration guide and server development.
Agent Skills
Reusable, model-readable instruction folders that the agent loads on demand. The CLI uses the same on-disk format as Claude Code, Gemini CLI, and OpenAI Codex CLI, so a skill authored for any of those tools drops into .infer/skills/ unchanged. Skills are enabled by default (since cli#618) - discovered skills are injected into the system prompt out of the box. Only the lightweight metadata (name + description) is added; each SKILL.md body is read on demand. Turn them off with agent.skills.enabled: false, or skip individual skills with disabled_skills.
# .infer/config.yaml
agent:
skills:
enabled: true # default
disabled_skills: [] # optional list of skill names to skip# Discover, install, and remove skills (also available in chat as /skills ...)
infer skills list
infer skills install acme/internal-comms # or a bare name, or a github tree URL
infer skills uninstall internal-commsOnce enabled, invoke a skill explicitly with /<name> (for example /pdf-helper) or by asking the agent to "use the <name> skill"; the CLI deterministically activates it by injecting the skill's metadata and pointing the agent at its SKILL.md. Installed skills under ~/.infer/skills and ./.infer/skills stay readable by the Read tool through a sandbox carve-out, so they load even when the agent runs outside the project directory (for example in CI).
See the full Agent Skills guide for the on-disk layout, the SKILL.md frontmatter contract, install flags, activation triggers, and the sandbox carve-out. To publish a skill in the shared index, see the Skills Catalog.
A2A Integration
Delegate specialized tasks to Agent-to-Agent compatible agents.
Setup:
# Initialize agents configuration
infer agents init
# Add remote agent
infer agents add calendar-agent http://calendar.example.com
# Add local agent with Docker
infer agents add my-agent http://localhost:8081 --oci ghcr.io/myorg/agent:latest --run
# List agents
infer agents list
# View agent details
infer agents show calendar-agentUsage:
infer chat
> "Schedule a meeting tomorrow at 2 PM using the calendar agent"
> /a2a # View connected agentsSee A2A documentation for creating custom agents, or use the ADL CLI to scaffold new A2A agents from YAML definitions.
Parallel Tool Execution
Execute up to 5 tools concurrently for improved performance.
Configuration:
agent:
max_concurrent_tools: 5 # Default: 5Benefits:
- Faster multi-file operations
- Concurrent web fetches
- Parallel code searches
- Reduced total execution time
Workflows
Bug Investigation and Fix
infer chat
# Shift+Tab to Plan Mode
> "Analyze bug in issue #123 and create fix plan"
# Shift+Tab to Standard Mode
> "Implement the fix according to the plan"
# Test and commit
> "Run test suite to verify"
> "/git commit"Feature Development
infer chat
> "Read CONTRIBUTING.md and understand project structure"
# Shift+Tab to Plan Mode
> "Design implementation for user profile feature with avatar upload"
# Shift+Tab twice to Auto-Accept Mode
> "Implement the user profile feature according to the plan"
# Shift+Tab to Standard Mode
> "Review changes and run all tests"Code Review and Refactoring
infer chat
# Plan Mode for analysis
> "Review authentication module for security issues and code quality"
# Standard Mode for implementation
> "Refactor based on recommendations, prioritize security issues"GitHub Issue Resolution
infer agent "Fix the bug described in GitHub issue #456"
# Agent autonomously:
# 1. Fetches issue details
# 2. Analyzes relevant code
# 3. Implements fix
# 4. Runs tests
# 5. Creates commit referencing issueBest Practices
For Beginners
- Start with Plan Mode for unfamiliar code
- Always work in git repositories
- Review diff visualizations before approving
- Begin with simple tasks
For Power Users
- Use Auto-Accept for trusted, repetitive tasks
- Create custom shortcuts for frequent commands
- Combine with scripts for automation
- Leverage A2A for specialized workflows
Performance Tips
- Be specific with file paths and function names
- Use Grep to narrow down relevant files first
- Break large tasks into smaller subtasks
- Provide context with references
Safety
- Review diffs before approving modifications
- Run tests after significant changes
- Have backups before extensive Auto-Accept usage
- Allow-list only trusted commands
- Add sensitive directories to protected paths
Security
Command allow-listing
The Bash tool is default-deny: a command auto-runs only when it matches the per-mode allowed-list for the active agent mode. The effective list is mode.all.allow (the every-mode baseline) unioned with the active mode's own entries:
tools:
bash:
mode:
all:
allow:
- ls( .*)?
- pwd( .*)?
- tree( .*)?
- git status( .*)?
- git diff( .*)?
- npm (install|test|run).*
auto:
allow:
- .* # unrestricted - Auto-Accept mode onlyRead-only gh operations are in the baseline so the agent can inspect GitHub out of the box; writes (gh issue/pr create|edit|comment) and destructive operations (for example gh pr merge, gh repo delete) are not - they fall through to approval. See Default gh allowed-list for the full list.
Entries match the whole command, and a clean-command guard rejects command substitution, multi-command chains/pipelines, file-write redirects, dangerous find actions, and environment-variable leaks before matching. The only thing that lifts the guard is the .* sentinel (Auto-Accept mode).
Protected Paths
Automatically excluded from tool access:
.git/- Repository data*.env- Environment files.infer/- Configuration directory- Custom paths via sandbox config
Approval Workflow
Tool approval has two independent layers - whether an action needs approval, and how that approval is delivered:
- Whether -
tools.safety.require_approval(with per-tool overrides liketools.bash.require_approval/tools.write.require_approval, and for Bash the per-mode allowed-list). - How -
tools.safety.approval_behaviour, one of:
approval_behaviour | How a needed approval is delivered |
|---|---|
prompt (default) | Prompt in the chat TUI; under a channel manager, deliver over IPC; otherwise block. |
ipc | Deliver over IPC when a broker is attached (e.g. the channel manager); otherwise block. |
block | Always reject an approval-requiring action with a reason - never prompt. |
infer config set tools.safety.require_approval true
infer config set tools.safety.approval_behaviour promptLLMs request approval before executing Write/Edit/Delete/Bash operations, with a colored, syntax-aware diff preview for file edits.
Headless secure-by-default
infer agent runs in standard mode, so an off-list or mutating action is not auto-run. With no approver reachable (CI, heartbeat) it is blocked with a reason; under a channel manager (--require-approval) it is sent for IPC approval (for example a Telegram confirmation). There is no .* default - full autonomy is an explicit opt-in (a curated allowed-list, the append override, or mode.auto / .*).
For a CI agent that should edit files and run a curated command set with no interactive approver, use the controlled-autonomy profile - block everything that would need approval, but let the agent write files and run a vetted allowed-list:
tools:
safety:
approval_behaviour: block # reject anything that would otherwise prompt
write:
require_approval: false # ...but let the agent write/edit files freely
bash:
mode:
all:
allow: # curate exactly what may run unattended
- git status( .*)?
- git add( .*)?
- go (build|test)( .*)?Add a couple more commands without touching config via INFER_TOOLS_BASH_ALLOW_APPEND="git commit,git push".
Troubleshooting
Connection Issues
# Check configuration
infer config get
# Verify gateway status
infer status
# Debug mode
infer --debug chatPermission Issues
# Check configuration directory
ls -la ~/.infer/
# Recreate config.yaml from defaults
infer config init --overwrite
# Re-initialize the project
infer initTool Execution Problems
# Inspect tool configuration
infer config get tools
# Check whether a bash command is allowed (without running it)
infer tools validate "git status"
# Enable debug logging
export INFER_LOGGING_DEBUG=true
infer agent "your task"Computer Use Issues
# Verify display server
echo $DISPLAY # Linux/X11
# Check permissions (macOS)
# System Preferences > Security & Privacy > Accessibility
# Test screenshot
infer chat
> "Take a screenshot and describe what you see"Shell Completions Not Working
# Confirm the completion script generates
infer completion zsh | head
# Zsh: the file must live on a directory in $fpath and be named _infer,
# then start a fresh shell
infer completion zsh > "${fpath[1]}/_infer" && exec zsh
# Bash: source the generated file (or place it under a bash-completion dir)
source <(infer completion bash)If completions still do not appear, the shell rc is usually not sourcing the completion file. Verify compinit is called for zsh (or bash-completion is installed for bash), confirm the file path is on $fpath/a bash-completion directory, then start a fresh shell.
Command Reference
| Command | Description |
|---|---|
infer init | Initialize project configuration |
infer status | Check gateway health and resource usage |
infer chat | Interactive chat session (TUI) |
infer chat --web | Web-based terminal interface |
infer agent <task> | Autonomous task execution |
infer skills <subcommand> | Manage Agent Skills (list, install, uninstall) |
infer channels-manager | Start the remote messaging daemon (Channels) |
infer config <subcommand> | Configuration management (init, get, set) |
infer tools <subcommand> | Run agent tools directly (execute, validate) |
infer agents <subcommand> | A2A agent management |
infer conversations <subcommand> | Conversation history management (list, show) |
infer completion <shell> | Generate a shell completion script (bash, zsh, fish, powershell) |
infer version | Show version information (backwards-compatible subcommand) |
infer --version | Show version information (styled by fang) |
infer --help | Display styled help information |
Support and Resources
- Repository: github.com/inference-gateway/cli
- Issues: GitHub Issues
- Releases: GitHub Releases
- Documentation: Full Configuration Reference
The CLI is actively developed with regular updates and new features. Check the repository for the latest releases and announcements.
