GitHub Action (`infer-action`)

inference-gateway/infer-action is the official GitHub Action wrapper for the infer CLI. It lets you run the Inference Gateway agent from a GitHub Actions workflow so that mentioning a trigger phrase in an issue or comment kicks off an automated, AI-driven response: plan posting, code edits, branch creation, and a pull request - all without leaving GitHub.

Current Version: v0.23.6. Pin to a tagged release (@v0.23.6) rather than @main in production workflows.

When to use it

Use infer-action when you want CI-driven inference instead of an interactive terminal session. Typical scenarios:

Issue triage and automated fixes - mention @infer in an issue, the agent reads it, makes the change, and opens a PR.
Automated code review - have the agent review pull requests on a schedule or on push.
Scheduled agents - cron-driven release notes, changelog drafts, dependency upgrades, drift reports.
Advisory-only workflows - run the agent in comment-only mode (enable-git-operations: false) to post suggestions without modifying the repo.

For local, interactive use the CLI remains the right tool; infer-action is the headless, event-driven counterpart.

Quick Start

Create .github/workflows/infer.yml:

yaml

name: Infer Agent

on:
  issues:
    types: [opened, edited]
  issue_comment:
    types: [created]

permissions:
  issues: write
  contents: write
  pull-requests: write

jobs:
  infer:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/[email protected]

      - uses: inference-gateway/[email protected]
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          model: anthropic/claude-opus-4-8
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Open an issue (or comment on one) containing @infer and the workflow takes over. The agent posts a "cooking" placeholder comment, runs, makes changes if needed, and finishes by either updating the comment with results or opening a pull request.

How it works

Trigger detection - the action inspects github.event.issue.title, github.event.issue.body, or github.event.comment.body for the configured trigger-phrase (default @infer). Comments authored by bot users are ignored to prevent recursion.
Reaction + cooking message - on a hit, the action adds an :eyes: reaction to the trigger comment and posts a placeholder "I'm cooking..." comment. Stale cooking messages from earlier runs are cleaned up.
CLI install - downloads infer at the pinned version and runs infer init --overwrite.
Git config - sets git user.name / user.email to the github-actions[bot] identity so any commits the agent makes have a valid author.
Agent run - executes the agent with the selected model and provider API keys. The bash allow-list is augmented with gh and git unless enable-git-operations: false.
PR creation - when the agent produces file changes, it creates fix/issue-{number}, commits, pushes, and opens a PR titled Fix #{number}: ... with Resolves #{number} in the body.
Result posting - the final comment summarises completed work and the model used, links the PR if any, and appends a footer with token usage, per-session cost, and the agent's tool-call count and success rate (see Result comment).

Native reminders

The action composes a default set of CLI system reminders and passes them to the CLI via INFER_REMINDERS_CONFIG. The composed default includes:

Periodic context nudge - a pre_tool / always reminder that keeps the agent focused on the task.
Turns-before-max wrap-up - a pre_tool / always reminder that fires near the turn limit, prompting the agent to wrap up.
Post-tool failure nudge - a post_tool / on_failure reminder that fires only after a failed tool call on writable runs, reminding the agent to retry or ask the user.
Memory nudges - when memory-repo is set, the CLI's built-in memory-consult and memory-hygiene reminders are re-emitted.

To override the composed default and supply your own full set of reminders, use the reminders-config input. When set, it replaces the action's default entirely (it is not merged), so any built-in behaviour you want must be re-declared.

yaml

- uses: inference-gateway/[email protected]
  with:
    reminders-config: |
      enabled: true
      reminders:
        - name: fail-nudge
          hook: post_tool
          trigger: on_failure
          text: "A failed call means the change did not happen"

See the CLI reminders documentation for the full YAML schema, trigger catalog (always, on_failure), and the INFER_REMINDERS_CONFIG / --reminders-file / on_failure trigger reference.

System prompt override

The action bundles a default system prompt for each context kind (issue, PR, fork PR, direct). You can override it with one of four inputs:

Input	Context	Template variables
`system-prompt-issue`	Issue-driven runs
`system-prompt-pr`	PR-driven runs (non-fork)	,
`system-prompt-pr-fork`	Fork PR runs (view-only)	, , ,
`system-prompt-direct`	Direct-prompt (manual) runs	(none)

When set, the input replaces the action's bundled default prompt for that context (it is not merged). The resulting prompt (override or bundled) then replaces the CLI's own base system prompt text. The CLI still appends its dynamic context block - skills, memory, tools, sandbox, and bash allow-list information - because the action pins INFER_AGENT_SYSTEM_PROMPT_WITH_DEFAULTS=true (the CLI default, pinned explicitly so a consumer config cannot turn it off).

The bundled defaults carry git-safety instructions (branch-first, commit-per-todo, push, draft PR, finish checklist). If your override omits those instructions, the action emits a ::warning:: in the run log so the lost-work guard is not dropped silently. Prefer custom-instructions to layer extras on top of the default unless you need a full replacement.

Claude Code subscription mode

In Claude Code subscription mode, the action sends its system prompt via INFER_PROMPTS_AGENT_SYSTEM_PROMPT_CLAUDE_CODE. The CLI sends no gateway system prompt in this mode; instead, the action's instructions are appended to Claude Code's own prompt via --append-system-prompt.

Source reference

Action PR: inference-gateway/infer-action#177 - fix: send the agent system prompt via INFER_PROMPTS_AGENT_SYSTEM_PROMPT (dead env var since CLI v0.105.0)

Dynamic model selection

Override the workflow's default model on a per-issue or per-comment basis by including /model provider/model-name in the trigger text:

text

@infer /model deepseek/deepseek-v4-flash please analyze this bug and suggest a fix

The override is parsed by the action's trigger-detection step and exported as INFER_AGENT_MODEL for that run only.

Result comment

When the run finishes, the action updates its comment with a result footer. Alongside the status, model, exit code, and job link, the footer reports:

Duration - wall-clock time of the agent run, formatted human-readably (0s, 1m 0s, 1h 1m 1s) and shown as — when unavailable. The measured window is spawn-to-exit of the infer agent child, so it excludes CLI install and Node setup time. The raw millisecond value is also exposed as the run-duration-ms output.
Tokens - prompt / completion / total token usage, plus the request count.
Cost - per-session input / output / total cost, when the CLI reports pricing.
Tool calls - the total number of tool calls the agent made, with the run's success rate. The rate is succeeded / total (where succeeded = total - failed), so a run with failures reads its failures in proportion. Any failed calls are listed in a collapsed section just below.

The Tool calls line is only rendered when the agent made at least one tool call:

text

## ✅ Infer Result: Success

**Model:** `anthropic/claude-opus-4-8` · **Exit Code:** `0` · **Duration:** 1m 0s · [View Job](...)

**Tokens:** 18,432 in · 2,106 out · 20,538 total (7 requests)

**Cost:** $0.04 in · $0.02 out · $0.06 total

**Tool calls:** 12 total · 83% success rate

<details><summary>⚠️ 2 failed tool call(s)</summary>
...
</details>

The total and failed tool-call counts are also exposed as the total-tool-calls-count and failed-tool-calls-count outputs for use in downstream steps.

Inputs

Input	Required	Default	Description
`github-token`	Yes	-	Token used for posting comments, creating branches, and opening PRs.
`github-app-slug`	No	`''`	Slug of the GitHub App whose bot identity authors the agent's commits (e.g. `infer-bot`); resolved via `GET /users/{slug}[bot]`. Falls back to `github-actions[bot]` when empty or on failure.
`model`	Yes	-	Model identifier in `provider/model-name` form (e.g. `anthropic/claude-opus-4-8`).
`trigger-phrase`	No	`@infer`	Phrase that activates the agent. Case-sensitive.
`direct-prompt`	No	`''`	Free-text task to run directly, bypassing issue/comment triggers. When set, the agent runs against this text under `workflow_dispatch` (or any event), commits to a new branch, and opens a PR; the result and PR link go to the job summary. See Direct prompt.
`version`	No	`v0.131.0`	`infer` CLI version to install inside the runner.
`max-turns`	No	`50`	Maximum agent iterations - acts as a runaway-cost guard.
`custom-instructions`	No	`''`	Extra instructions appended to the default system prompt (does not replace the defaults).
`system-prompt-issue`	No	`''`	Overrides the action's bundled system prompt for issue-driven runs. Substitutes . See System prompt override.
`system-prompt-pr`	No	`''`	Overrides the action's bundled system prompt for PR-driven runs (non-fork). Substitutes , . See System prompt override.
`system-prompt-pr-fork`	No	`''`	Overrides the action's bundled system prompt for fork PR runs (view-only). Substitutes , , , . See System prompt override.
`system-prompt-direct`	No	`''`	Overrides the action's bundled system prompt for direct-prompt (`workflow_dispatch`) runs. No variables. See System prompt override.
`bash-whitelist-commands`	No	`''`	Comma-separated commands appended to the agent's bash allow-list (e.g. `npm,yarn,pnpm`).
`bash-whitelist-patterns`	No	`''`	Comma-separated regex patterns appended to the agent's bash allow-list (e.g. `^npm .,^yarn .`).
`enable-git-operations`	No	`true`	When `false`, the agent runs in comment-only mode - `git`/`gh` are not allow-listed and no PRs are created.
`mirror-agent-logs`	No	`true`	When `true` (default), the agent's full stdout/stderr transcript (tool inputs, tool outputs, file contents it read, web-fetch payloads, intermediate text) is mirrored to the Actions run log. Set to `false` to suppress that transcript from the run log. See Agent log mirroring.
`memory-repo`	No	`''`	Git remote URL backing the agent's persistent cross-run memory (ssh or https, e.g. `[email protected]:my-org/agent-memory.git`). Enables the CLI's memory git backend: pull on run start, commit + push when a fact changes. Empty = feature off. See Persistent Agent Memory.
`memory-branch`	No	`''`	Branch of `memory-repo` to sync (`INFER_MEMORY_BACKEND_GIT_BRANCH`). Empty = CLI default (`main`).
`memory-sync-on-start`	No	`''`	Pull memory at run start: `pull` or `off` (`INFER_MEMORY_BACKEND_GIT_SYNC_ON_START`). Empty = CLI default (`pull`).
`memory-sync-on-finish`	No	`''`	Push memory changes at run finish: `push` or `off` (`INFER_MEMORY_BACKEND_GIT_SYNC_ON_FINISH`). Empty = CLI default (`push`).
`memory-deploy-key`	No	`''`	SSH private key (e.g. a deploy key with write access) authenticating an ssh `memory-repo`. Secret, auto-masked. See Persistent Agent Memory.
`memory-token`	No	`''`	Token authenticating an https `memory-repo` (scoped git insteadOf rewrite). Secret, auto-masked. Empty on a same-instance https URL = falls back to `github-token`.
`reminders-config`	No	`''`	Verbatim reminders YAML passed to the CLI via `INFER_REMINDERS_CONFIG`, replacing the action's composed default. Lets a power user take full control of the CLI's native reminders (hooks, triggers, cadences). A supplied config replaces the action's default, so built-in behaviour must be re-covered if desired. Requires Infer CLI >= v0.129.0. See Native reminders and the CLI reminders docs.
`dry-run`	No	`false`	Plan-only local-testing mode (e.g. with `act`): forces the bundled mock agent, simulates every GitHub mutation (`[dry-run] would ...`), and prints the resolved system/task/reminder prompts and tool allow-lists. Reads still run. See Local testing with act.
`mock-agent-scenario`	No	`happy`	Which scenario the bundled mock agent runs under `dry-run`: `happy`, `failures`, `no-todos`, or `empty`.
`anthropic-api-key`	No*	-	Required when using an Anthropic model.
`openai-api-key`	No*	-	Required when using an OpenAI model.
`google-api-key`	No*	-	Required when using a Google/Gemini model.
`deepseek-api-key`	No*	-	Required when using a DeepSeek model.
`groq-api-key`	No*	-	Required when using a Groq model.
`mistral-api-key`	No*	-	Required when using a Mistral model.
`cloudflare-api-key`	No*	-	Required when using a Cloudflare Workers AI model.
`cohere-api-key`	No*	-	Required when using a Cohere model.
`ollama-api-key`	No*	-	Required when using a self-hosted Ollama endpoint.
`ollama-cloud-api-key`	No*	-	Required when using Ollama Cloud.
`moonshot-api-key`	No*	-	Required when using a Moonshot (Kimi) model.
`minimax-api-key`	No*	-	Required when using a MiniMax model.
`nvidia-api-key`	No*	-	Required when using an NVIDIA model.
`zai-api-key`	No*	-	Required when using a ZAI model.

* Provide the key matching the provider of the chosen model. Multiple keys can be supplied so the same workflow handles overrides to different providers.

nvidia-api-key release status: the NVIDIA provider input is available on inference-gateway/infer-action@main and ships in the first tagged release after v0.24.0. Reference @main to use it until that tag is cut, then pin to the tag.
zai-api-key release status: the ZAI provider input is available on inference-gateway/infer-action@main and ships in v0.30.0 and later. Pin to v0.30.0 or later.

The action also accepts seven opt-in OpenTelemetry inputs (otel-*) for exporting run telemetry to an OTLP collector. They are disabled by default and change nothing for existing workflows - see OpenTelemetry export.

It also accepts five inputs for running on a Claude subscription instead of a provider API key (use-claude-code-subscription, claude-code-oauth-token, claude-code-cli-version, claude-code-max-output-tokens, claude-code-thinking-budget). They are off by default - see Claude Code subscription mode.

Outputs

Output	Description
`result`	Human-readable summary of the agent execution.
`exit-code`	Exit code returned by `infer` - non-zero means the agent failed.
`pr-url`	URL of the pull request the agent opened (empty if none). Populated for direct-prompt runs and any run that opens a PR.
`run-duration-ms`	Wall-clock duration of the agent run in milliseconds (0 if unavailable).
`failed-tool-calls-count`	Number of failed tool calls detected in the agent output.
`total-tool-calls-count`	Total number of tool calls the agent made during the run.

Reference outputs in downstream steps via ${{ steps.<id>.outputs.result }}.

Claude Code subscription mode

By default infer-action bills each run against a provider API key (anthropic-api-key, openai-api-key, ...) at pay-per-token rates. Claude Code subscription mode is an alternative: it runs the agent through the Infer CLI's Claude Code mode, billed against your Claude Max or Pro subscription instead of a provider API key.

Enable it with two inputs and no provider key:

yaml

- uses: inference-gateway/infer-action@main
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    model: anthropic/claude-sonnet-4-5-20250929 # provider-prefixed (bare ids also accepted)
    use-claude-code-subscription: true
    claude-code-oauth-token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}

When use-claude-code-subscription: true, the action runs a gated setup step that installs the official claude CLI (npm install -g @anthropic-ai/claude-code@<version>) on the runner, then runs the agent through it with INFER_CLAUDE_CODE_ENABLED=true and INFER_GATEWAY_RUN=false. The runner therefore needs npm - the default ubuntu-24.04 image has it. The step is skipped under dry-run.

Release status: this mode is available on inference-gateway/infer-action@main and ships in the next tagged release after v0.17.2. Pin to that tag once it is cut rather than tracking @main in production workflows.

Minting, storing, and rotating the OAuth token

The mode authenticates with a long-lived Claude subscription OAuth token instead of a provider API key.

Mint the token. On a machine signed in to a Claude Pro or Max account, run:
bash
```
claude setup-token
```
It prints a token in the form sk-ant-oat01-..., valid for about one year. A Claude Pro or Max subscription is required to mint and use it.
Store it as a repository secret. Save the token as the CLAUDE_CODE_OAUTH_TOKEN secret (repository Settings -> Secrets and variables -> Actions), then pass it through the claude-code-oauth-token input as shown above. The action keeps the token off $GITHUB_ENV and adds it to its redaction list, so it does not appear in comments, job summaries, or the run log.
Rotate before expiry. Because the token lasts about a year, re-run claude setup-token and update the secret before it expires to avoid failed runs.

No provider API key is needed in this mode. A stray ANTHROPIC_API_KEY in the environment is ignored - the Claude Code CLI strips it - so a run cannot silently reroute to paid pay-per-token API billing.

Inputs

Input	Required	Default	Description
`use-claude-code-subscription`	No	`false`	Run the agent via the Infer CLI's Claude Code mode, billed against a Claude Max/Pro subscription instead of a provider API key. Installs the `claude` CLI and sets `INFER_CLAUDE_CODE_ENABLED=true` / `INFER_GATEWAY_RUN=false`.
`claude-code-oauth-token`	No*	-	Claude subscription OAuth token (`CLAUDE_CODE_OAUTH_TOKEN`) minted by `claude setup-token`. Required when `use-claude-code-subscription` is `true`.
`claude-code-cli-version`	No	`2.1.187`	Version of the `@anthropic-ai/claude-code` npm package to install for Claude Code mode.
`claude-code-max-output-tokens`	No	-	Optional max output tokens for Claude Code mode (`INFER_CLAUDE_CODE_MAX_OUTPUT_TOKENS`). Empty uses the CLI default.
`claude-code-thinking-budget`	No	-	Optional extended-thinking token budget for Claude Code mode (`INFER_CLAUDE_CODE_THINKING_BUDGET`). Empty uses the CLI default.

* Required only when use-claude-code-subscription: true.

These compose with the standard inputs (github-token, model, max-turns, custom-instructions, bash-whitelist-*, ...). In this mode max-turns bounds both the Infer agent loop (INFER_AGENT_MAX_TURNS) and the Claude CLI turn limit (INFER_CLAUDE_CODE_MAX_TURNS).

Model ids

In subscription mode the model input - and any /model override embedded in the trigger text - use the same anthropic/-prefixed form as the default mode and the pricing table, for example anthropic/claude-sonnet-4-5-20250929. Bare Claude ids such as claude-sonnet-4-5-20250929 are still accepted and normalized for back-compat, so existing workflows keep working. The CLI strips the provider prefix before invoking the claude CLI.

Run metrics in this mode

The result-comment footer renders the full set of metrics here, the same as in the default mode:

Tool calls and failures - the total tool-call count, success rate, and the collapsed failed-call list all work as usual, and the total-tool-calls-count / failed-tool-calls-count outputs are populated.
Token usage and per-session cost render here too. Per-turn token usage and per-session cost now surface in Claude Code mode, so the Tokens and Cost footer lines render automatically, with no change to your workflow.
Cost is an estimate, not a billed amount. A Claude subscription has no per-call billing, so the cost is estimated from the token counts via the CLI's own pricing list - the same way Claude Code itself estimates cost from a bundled price table. It reflects the equivalent pay-per-token API price, not a charge against your subscription.

Limitations

Claude models only - the agent runs through the claude CLI, so only Claude model ids are valid.
Image inputs are dropped - image content in issues or comments is not forwarded to the model.
Prompt caching is unavailable in this mode.
Fork pull requests do not run this mode. GitHub does not expose repository secrets (including CLAUDE_CODE_OAUTH_TOKEN) to workflows triggered from forks, so the OAuth token is absent there - by design.

Example workflow

A ready-to-run workflow ships at examples/claude-code-subscription.yml. Copy it into .github/workflows/ in your repository:

yaml

# Run the agent on a Claude Max/Pro subscription instead of a provider API key
# (the Infer CLI's "Claude Code mode").
#
# Trigger: open/edit an issue, or post a comment, whose text contains "@infer".
# The agent runs through the official `claude` CLI, billed against your Claude
# subscription rather than a pay-per-token API key.
#
# Setup:
#   1. Run `claude setup-token` locally (needs Claude Pro/Max) to mint a
#      long-lived OAuth token (sk-ant-oat01-..., valid ~1 year).
#   2. Store it as the repository secret CLAUDE_CODE_OAUTH_TOKEN.
#   3. No provider API key is needed.
#
# Note: `model` uses the same anthropic/-prefixed form as the default mode
# (e.g. anthropic/claude-sonnet-4-5-20250929). Bare Claude ids (e.g.
# claude-sonnet-4-5-20250929) are still accepted for back-compat.
#
# Copy this into `.github/workflows/` in your repo.
name: Infer Agent (Claude subscription)

on:
  issues:
    types:
      - opened
      - edited
  issue_comment:
    types:
      - created

permissions:
  issues: write # post the progress/result comment
  contents: write # create the branch and commit changes
  pull-requests: write # open the pull request

jobs:
  infer:
    runs-on: ubuntu-24.04 # needs npm to install the `claude` CLI (the default image has it)
    steps:
      - uses: actions/[email protected]

      - uses: inference-gateway/infer-action@main
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          model: anthropic/claude-sonnet-4-5-20250929
          use-claude-code-subscription: true
          claude-code-oauth-token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}

Persistent Agent Memory

infer-action supports the Infer CLI's persistent-memory git backend (inference-gateway/cli#707, shipped in CLI v0.127.0). When enabled, the agent's memory is pulled from a git remote at run start and committed + pushed when a fact changes, giving the agent cross-run memory in CI.

Opt-in and inert by default. When memory-repo is empty (the default), no memory environment variables are set and nothing changes for existing workflows. The feature requires Infer CLI >= v0.127.0; the action's default version pin already satisfies this.

Inputs

Input	Env mapping	Default
`memory-repo`	`INFER_MEMORY_ENABLED=true`, `INFER_MEMORY_BACKEND_TYPE=git`, `INFER_MEMORY_BACKEND_GIT_REPO`	`''` (off)
`memory-branch`	`INFER_MEMORY_BACKEND_GIT_BRANCH`	`''` (CLI default: `main`)
`memory-sync-on-start`	`INFER_MEMORY_BACKEND_GIT_SYNC_ON_START` (`pull` or `off`)	`''` (CLI default: `pull`)
`memory-sync-on-finish`	`INFER_MEMORY_BACKEND_GIT_SYNC_ON_FINISH` (`push` or `off`)	`''` (CLI default: `push`)
`memory-deploy-key`	SSH private key for an ssh `memory-repo` (secret, auto-masked)	`''`
`memory-token`	Token for an https `memory-repo` (secret, auto-masked)	`''`

The action maps these inputs to INFER_MEMORY_* environment variables via $GITHUB_ENV, writing only non-empty values so a consumer's own .infer/memory.yaml is never clobbered. The memory-sync-on-start and memory-sync-on-finish values are validated up front (pull or off only).

Auth options

The action configures authentication for the memory remote based on the URL scheme of memory-repo:

1. SSH repo + deploy key

Use an SSH remote with a dedicated deploy key. The key is written to ~/.ssh/infer-memory-deploy-key (mode 600), the host is keyscanned, and it is wired via core.sshCommand with IdentitiesOnly=yes.

yaml

- uses: inference-gateway/[email protected]
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    model: anthropic/claude-opus-4-8
    anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
    memory-repo: [email protected]:my-org/agent-memory.git
    memory-deploy-key: ${{ secrets.MEMORY_DEPLOY_KEY }}

Store the deploy key as a repository secret (MEMORY_DEPLOY_KEY) and grant it write access to the memory repository.

2. HTTPS repo + token

Use an HTTPS remote with a personal access token or GitHub App installation token. The token is applied as a git insteadOf rewrite scoped to the memory repo URL (never persisted in the memory clone's .git/config).

yaml

- uses: inference-gateway/[email protected]
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    model: anthropic/claude-opus-4-8
    anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
    memory-repo: https://github.com/my-org/agent-memory
    memory-token: ${{ secrets.MEMORY_TOKEN }}

Store the token as a repository secret (MEMORY_TOKEN). Both memory-deploy-key and memory-token are auto-masked in logs and redacted from the cooking comment.

3. Workflow repo branch (no extra secret)

When memory-repo is an HTTPS URL on the same GitHub instance and neither memory-token nor memory-deploy-key is set, the action falls back to github-token. This enables the lightest setup: a memory branch of the workflow repository itself, with no extra secret required. The contents: write permission (already needed for PR creation) covers the memory pushes too.

yaml

name: Infer Agent (with memory)

on:
  issues:
    types: [opened, edited]
  issue_comment:
    types: [created]

permissions:
  issues: write
  contents: write
  pull-requests: write

jobs:
  infer:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/[email protected]

      - uses: inference-gateway/[email protected]
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          model: anthropic/claude-opus-4-8
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
          memory-repo: https://github.com/${{ github.repository }}
          memory-branch: agent-memory

The action emits a ::notice:: when falling back to github-token so the behaviour is transparent.

Identity

Memory commits are attributed to the same bot identity as the agent's code commits. The action's Configure Git step exports the resolved bot identity as GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, GIT_COMMITTER_NAME, and GIT_COMMITTER_EMAIL:

<github-app-slug>[bot] when a GitHub App token is used
github-actions[bot] otherwise

These environment variables are set on $GITHUB_ENV so the CLI's memory git backend picks them up for commits made outside the workspace clone.

CLI-owned defaults

The following defaults live in the Infer CLI and apply when the corresponding input is left empty:

Setting	CLI default
Branch	`main`
Sync on start	`pull`
Sync on finish	`push`
Per-git-op timeout	`60s`
Commit message	`chore(memory): sync`

The memory-branch, memory-sync-on-start, and memory-sync-on-finish inputs override these only when set to a non-empty value.

Behaviour notes

Best-effort. Memory sync never fails the run. If the remote is unreachable, push fails, or a rebase conflict occurs, the agent run continues and the result comment is posted as usual.
Concurrent runs. When multiple workflow runs push to the same memory remote simultaneously, the CLI reconciles them with a push -> pull-rebase -> retry loop. Conflicting facts from the most recent push win.
Independent of enable-git-operations. Memory sync works regardless of whether the agent is allowed to create branches and PRs. A comment-only workflow (enable-git-operations: false) can still persist and retrieve memory.
Requires Infer CLI >= v0.127.0. The action's default version pin already satisfies this requirement.

Source references

Action PR: inference-gateway/infer-action#142 - feat: support the persistent memory git backend
CLI PR: inference-gateway/cli#707 - persistent memory git backend
Example workflow: examples/with-memory.yml in the action repository

Direct prompt (manual runs)

Normally the action reads its task from the issue or comment that contains the trigger phrase. To run the agent against a free-text task with no issue or comment - for example from a manual workflow_dispatch form - pass the text through direct-prompt:

yaml

name: Infer (manual)

on:
  workflow_dispatch:
    inputs:
      prompt:
        description: 'Task for the agent to work on'
        required: true
        type: string

permissions:
  contents: write
  pull-requests: write

jobs:
  infer:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/[email protected]

      - uses: inference-gateway/[email protected]
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          model: anthropic/claude-opus-4-8
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
          direct-prompt: ${{ inputs.prompt }}

Trigger it from the Actions tab: pick the workflow, choose Run workflow, and type a task. No issue, comment, or trigger phrase is needed.

When direct-prompt is non-empty:

The agent runs against that text instead of an issue or comment body, so no issues/issue_comment event is required - the action works under workflow_dispatch (or any event).
There is no issue/PR thread to reply to, so the agent commits its work to a new branch and opens a pull request. The run's result and the PR link are written to the workflow job summary, and the PR URL is exposed as the pr-url output.
All other inputs (model, skills, max-turns, bash-whitelist-*, provider keys, ...) compose as usual. A /model override embedded in the prompt text is honoured, just as in event-driven mode.
With enable-git-operations: false, direct-prompt runs in advisory mode: the agent only writes its findings to the job summary, with no branch or PR.

Leave direct-prompt empty (the default) and event-driven behaviour is unchanged.

Local testing with `act`

Set dry-run: true to exercise the whole workflow in a plan-only mode - ideal for trying a workflow locally with act before it runs for real. In dry-run the action:

Forces the bundled mock agent - no real CLI install and no provider token, so it composes with any model without spending anything. (This replaces the former use-mock-agent input; use dry-run: true instead.)
Simulates every GitHub mutation. Instead of creating or updating a comment, the :eyes: reaction, the "I'm cooking..." comment, comment zones, or the spinner, it logs a [dry-run] would ... line. Secret values are still redacted in the printed bodies.
Prints a DRY RUN banner with the exact system / task / reminder prompts and the resolved bash allow-list and web-fetch domains the agent would receive.
Keeps GitHub reads real so you see the actual target issue or PR. Reads fail soft when no token is available - a public-repo read still works unauthenticated; otherwise the run warns and continues.

mock-agent-scenario selects which scripted run the mock agent performs under dry-run: happy (the default - TodoWrite passes, a read, and a commit on a fix/ branch), failures (the happy path with interspersed tool-call failures), no-todos (work without any TodoWrite calls), or empty (exit immediately with no tool calls).

The infer-action repo ships ready-to-run local workflows under examples/local/ that run the working-tree action (uses: ./) in dry-run, driven through act by Taskfile helpers:

bash

task test:issue     # issues event
task test:comment   # issue_comment event
task test:direct    # workflow_dispatch / direct-prompt mode
task test:all       # all three

No .env, token, or provider key is required - mutations are simulated and reads fail soft. Pass a token to resolve real reads:

bash

task test:issue -- -s GITHUB_TOKEN=$(gh auth token)

Recipes

PR review on push

Run the agent against every push to a PR branch and have it leave review comments without modifying the code.

yaml

name: AI PR Review

on:
  pull_request:
    types: [opened, synchronize, reopened]

permissions:
  pull-requests: write
  contents: read

jobs:
  review:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/[email protected]
        with:
          fetch-depth: 0

      - uses: inference-gateway/[email protected]
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          model: anthropic/claude-opus-4-8
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
          trigger-phrase: '@review'
          enable-git-operations: false
          custom-instructions: |
            - Focus on correctness, security, and performance.
            - Quote specific files and line numbers.
            - Do not modify code; post review feedback as a comment only.

Trigger by commenting @review on the PR.

Scheduled summary / drift report

Run a daily agent that reads recent commits and posts a summary to a tracking issue.

yaml

name: Daily Drift Summary

on:
  schedule:
    - cron: '0 7 * * *' # 07:00 UTC daily
  workflow_dispatch:

permissions:
  issues: write
  contents: read

jobs:
  summary:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/[email protected]

      - uses: inference-gateway/[email protected]
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          model: deepseek/deepseek-v4-flash
          deepseek-api-key: ${{ secrets.DEEPSEEK_API_KEY }}
          enable-git-operations: false
          max-turns: 20
          custom-instructions: |
            - Read the last 24 hours of commits on main.
            - Post a Markdown summary as a comment on issue #1 (the drift tracker).
            - Group changes by area (api, sdks, docs, infra).

Pair with a tracker issue containing the trigger phrase in its body so the schedule has something to fire against, or use workflow_dispatch to invoke the agent against a freshly created tracker issue.

Agent-driven release notes

On every tag push, generate release notes by running the agent over the commits since the previous tag.

yaml

name: Release Notes

on:
  push:
    tags:
      - 'v*'

permissions:
  contents: write
  issues: write

jobs:
  notes:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/[email protected]
        with:
          fetch-depth: 0

      - uses: inference-gateway/[email protected]
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          model: anthropic/claude-opus-4-8
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
          bash-whitelist-commands: gh,git
          custom-instructions: |
            - Diff the current tag against the previous tag.
            - Categorise commits using Conventional Commits prefixes.
            - Publish the result via `gh release edit <tag> --notes-file ...`.

Extending the bash allow-list

The default bash allow-list is intentionally narrow (read-only commands plus read-only gh; see Default gh allowed-list). Add what your project needs.

The action exposes inputs that append to the agent's allow-list:

yaml

- uses: inference-gateway/[email protected]
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    model: anthropic/claude-opus-4-8
    anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
    bash-whitelist-commands: npm,yarn,pnpm,node,python3,pytest
    bash-whitelist-patterns: '^npm run .*,^yarn .*,^pytest .*'

The added entries are appended to the defaults - they do not replace them.

You can also append at the CLI layer with the INFER_TOOLS_BASH_ALLOW_APPEND environment variable (comma- or newline-separated), which merges onto the every-mode mode.all baseline. Handy for adding a couple of commands - for example letting a release agent commit and push without shipping the unrestricted .* sentinel:

yaml

- uses: inference-gateway/[email protected]
  env:
    INFER_TOOLS_BASH_ALLOW_APPEND: 'git commit,git push'
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    model: anthropic/claude-opus-4-8
    anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Controlled-autonomy CI profile

A headless infer agent is secure-by-default: in CI there is no interactive approver, so any off-list or mutating action is blocked rather than auto-run. To let an unattended agent edit files and run a curated command set without prompting, combine a block approval behaviour with a relaxed write gate and a curated allow-list - set entirely through environment variables on the step:

yaml

- uses: inference-gateway/[email protected]
  env:
    INFER_TOOLS_SAFETY_APPROVAL_BEHAVIOUR: block # reject anything that would otherwise prompt
    INFER_TOOLS_WRITE_REQUIRE_APPROVAL: 'false' # ...but let the agent write/edit files
    INFER_TOOLS_BASH_ALLOW_APPEND: 'git add,git commit,git push'
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    model: anthropic/claude-opus-4-8
    anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

This is the recommended shape for an autonomous CI agent: explicit about exactly what may run unattended, with everything else hard-blocked rather than silently auto-approved.

Agent log mirroring

The mirror-agent-logs input controls whether the agent's full stdout/stderr transcript is mirrored to the GitHub Actions run log. When true (the default), every tool input, tool output, file content the agent reads, web-fetch payload, and intermediate text appears in the live log - useful for debugging and understanding what the agent did.

Why suppress the log?

GitHub Actions run logs are persisted with the workflow run, downloadable as raw logs, and visible to everyone with read access to the repository. For public repositories that means the whole world. While ::add-mask:: redacts known secret values, it cannot catch everything - a private source file, customer data in a fetched page, or a secret printed by a tool could leak into the log. mirror-agent-logs: false is a hard off-switch for the entire transcript.

What is still visible when logs are suppressed?

/tmp/agent-output.txt - the full, unredacted transcript is still written to this file on the runner. It is not uploaded or persisted beyond the job.
Cooking-comment footer - the result comment (status, model, token usage, cost, tool-call stats) is posted to the issue/PR as normal.
Step summary - the job summary ($GITHUB_STEP_SUMMARY) renders in full.
Minimal heartbeat - ticker updates and the final exit code still print, so the step is not completely silent.

Example

yaml

- uses: inference-gateway/[email protected]
  with:
    model: anthropic/claude-opus-4-8
    github-token: ${{ secrets.GITHUB_TOKEN }}
    mirror-agent-logs: false # keep the full agent transcript out of the Actions run log

OpenTelemetry export

infer-action can export OpenTelemetry telemetry about each agent run to any OTLP-compatible collector (an OpenTelemetry Collector, Grafana Alloy, Honeycomb, Tempo, Jaeger, and so on). The feature is opt-in and disabled by default - with otel-exporter-otlp-endpoint empty, nothing is exported and the action behaves exactly as it did before.

Export is best-effort and runs after the user-visible result comment is posted: it never blocks the result, and a slow or unreachable collector never fails the run. Resource attributes and metric / span names follow the OpenTelemetry GenAI semantic conventions, so the data lines up with other GenAI instrumentation in your backend.

Signals

otel-signals chooses what to export (comma-separated); the default is metrics:

metrics - the cheapest, highest-value signal (run-level metrics such as token usage and tool-call counts); exported by default.
traces - one root span per run.
logs - one ERROR record per failed tool call.

Set otel-signals: metrics,traces,logs to enable all three.

Inputs

Each input maps to the standard OpenTelemetry environment variable of the same name, so the underlying exporter honours it directly.

Input	Default	Env var	Description
`otel-exporter-otlp-endpoint`	`''` (disabled)	`OTEL_EXPORTER_OTLP_ENDPOINT`	OTLP HTTP endpoint, e.g. `http://localhost:4318`. Empty (the default) disables all export.
`otel-exporter-otlp-headers`	`''`	`OTEL_EXPORTER_OTLP_HEADERS`	Comma-separated `key=value` headers, e.g. `Authorization=Bearer my-token`. Treated as secret and auto-masked.
`otel-exporter-otlp-protocol`	`http/json`	`OTEL_EXPORTER_OTLP_PROTOCOL`	OTLP transport protocol. Only `http/json` is implemented; gRPC is not supported.
`otel-service-name`	`infer-action`	`OTEL_SERVICE_NAME`	Value for the `service.name` resource attribute on exported telemetry.
`otel-resource-attributes`	`''`	`OTEL_RESOURCE_ATTRIBUTES`	Extra resource attributes in `key=val,key2=val2` form, appended to the standard set.
`otel-signals`	`metrics`	`OTEL_SIGNALS`	Comma-separated signals to export: `metrics` (default), `traces`, `logs`.
`otel-export-timeout-ms`	`5000`	`OTEL_EXPORT_TIMEOUT_MS`	Per-request timeout in milliseconds for each OTLP HTTP POST.

The standard resource attributes attached to every export are service.name, service.version, gen_ai.provider.name, and CI context (cicd.pipeline.*, vcs.repository.*, github.*); otel-resource-attributes appends to that set. Because only http/json is supported, point the endpoint at the collector's HTTP port (4318 on a standard collector), not the gRPC port (4317).

Example

yaml

- uses: inference-gateway/[email protected]
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    model: anthropic/claude-opus-4-8
    anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
    otel-exporter-otlp-endpoint: https://otel-collector.example.com:4318
    otel-exporter-otlp-headers: ${{ secrets.OTEL_EXPORTER_OTLP_HEADERS }} # e.g. Authorization=Bearer ...
    otel-signals: metrics,traces,logs

Leave otel-exporter-otlp-endpoint unset (the default) and the block above is inert - existing workflows need no changes.

Secrets and least-privilege

Three principles to follow:

Never hardcode API keys. Store them in GitHub Secrets and reference them via ${{ secrets.NAME }}. The action only reads provider keys from inputs, which are injected as environment variables for the agent step.
Grant the minimum workflow permissions for the mode you want. For a PR-creating agent:
yaml
```
permissions:
  issues: write
  contents: write
  pull-requests: write
```
For comment-only mode (enable-git-operations: false):
yaml
```
permissions:
  issues: write
  contents: read
```
Prefer a GitHub App over the default GITHUB_TOKEN for cross-repo or higher-trust workflows. The CLI's /init-github-action wizard automates the GitHub App setup (see the CLI docs). Using an App token lets PRs created by the agent trigger downstream CI - PRs opened with the default GITHUB_TOKEN do not, by GitHub design.

Additional hardening:

Cap max-turns to prevent runaway loops and bound cost. 30-50 covers most issues.
Set enable-git-operations: false whenever the agent only needs to read and comment.
Keep the bash allow-list narrow - only add commands you trust the agent to invoke.
Pin inference-gateway/infer-action@<tag> (not @main) so an upstream change cannot silently alter behaviour.

CLI wizard integration

The CLI ships an interactive wizard, /init-github-action, that automates everything in this page: creating a GitHub App, registering its credentials as secrets, and writing a workflow file under .github/workflows/infer.yml. Use the wizard for first-time setup; come back to this page when you need to customise inputs, write recipes by hand, or harden secrets handling beyond the defaults.

CLI - the infer binary that infer-action installs and drives.
Configuration - environment variables understood by the gateway and CLI.
infer-action repository - source, releases, and issue tracker.

GitHub Action (infer-action) ​

When to use it ​

Quick Start ​

How it works ​

Native reminders ​

System prompt override ​

Claude Code subscription mode ​

Source reference ​

Dynamic model selection ​

Result comment ​

Inputs ​

Outputs ​

Claude Code subscription mode ​

Minting, storing, and rotating the OAuth token ​

Inputs ​

Model ids ​

Run metrics in this mode ​

Limitations ​

Example workflow ​

Persistent Agent Memory ​

Inputs ​

Auth options ​

1. SSH repo + deploy key ​

2. HTTPS repo + token ​

3. Workflow repo branch (no extra secret) ​

Identity ​

CLI-owned defaults ​

Behaviour notes ​

Source references ​

Direct prompt (manual runs) ​

Local testing with act ​

Recipes ​

PR review on push ​

Scheduled summary / drift report ​

Agent-driven release notes ​

Extending the bash allow-list ​

Controlled-autonomy CI profile ​

Agent log mirroring ​

Why suppress the log? ​

What is still visible when logs are suppressed? ​

Example ​

OpenTelemetry export ​

Signals ​

Inputs ​

Example ​

Secrets and least-privilege ​

CLI wizard integration ​

Related ​

GitHub Action (`infer-action`)

When to use it

Quick Start

How it works

Native reminders

System prompt override

Claude Code subscription mode

Source reference

Dynamic model selection

Result comment

Inputs

Outputs

Claude Code subscription mode

Minting, storing, and rotating the OAuth token

Inputs

Model ids

Run metrics in this mode

Limitations

Example workflow

Persistent Agent Memory

Inputs

Auth options

1. SSH repo + deploy key

2. HTTPS repo + token

3. Workflow repo branch (no extra secret)

Identity

CLI-owned defaults

Behaviour notes

Source references

Direct prompt (manual runs)

Local testing with `act`

Recipes

PR review on push

Scheduled summary / drift report

Agent-driven release notes

Extending the bash allow-list

Controlled-autonomy CI profile

Agent log mirroring

Why suppress the log?

What is still visible when logs are suppressed?

Example

OpenTelemetry export

Signals

Inputs

Example

Secrets and least-privilege

CLI wizard integration

Related