Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pome.sh/llms.txt

Use this file to discover all available pages before exploring further.

You can add Pome to GitHub Actions with one step — the action installs the CLI, boots a twin, runs your scenarios, scores the results, and uploads trace artifacts to the workflow run. Your pipeline gets a clear exit code: zero means all scenarios passed, non-zero means something failed or errored.

Using pome-sh/run-scenarios-action

The simplest usage points the action at a single scenario file:
- uses: pome-sh/run-scenarios-action@v1
  with:
    scenario-path: scenarios/triage-agent.md
To run every scenario in a directory, pass the directory path instead:
- uses: pome-sh/run-scenarios-action@v1
  with:
    scenario-path: scenarios/

Full workflow example

This workflow runs agent evaluation on every push and pull request. It passes your LLM provider key to the agent subprocess via env and your Pome API key (for hosted mode) via the action input.
name: Agent evaluation
on: [push, pull_request]

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: pome-sh/run-scenarios-action@v1
        with:
          scenario-path: scenarios/
          hosted-api-key: ${{ secrets.POME_API_KEY }}
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

Action inputs

InputRequiredDescription
scenario-pathYesPath to a scenario .md file or a directory of scenario files
hosted-api-keyNoAPI key for the hosted Pome control plane. Omit to use a local twin instead

What the action does

1

Installs the pome CLI

The action downloads and installs the pome binary appropriate for the runner OS.
2

Boots a twin

If hosted-api-key is provided, the action provisions a hosted twin session via the Pome API. Otherwise it starts a local twin on the runner using Docker.
3

Runs scenarios and scores them

The action calls pome run for each scenario file. For [P] criteria, the CLI runs the LLM judge locally using your provider key — the Pome cloud never sees your key or the agent’s output.
4

Uploads trace artifacts

After scoring, the action uploads the runs/ directory as a GitHub Actions artifact named pome-traces. You can download it from the workflow run page to inspect tool_calls.jsonl, score.json, and state_final.json for any run.

Setting secrets

Add the following secrets to your repository under Settings → Secrets and variables → Actions:
  • POME_API_KEY — your Pome team API key (required for hosted mode; omit for local twin)
  • Your LLM provider key — one of:
    • ANTHROPIC_API_KEY for Claude models
    • OPENAI_API_KEY for OpenAI models
    • POME_LLM_API_KEY for any other OpenAI-compatible endpoint (set POME_LLM_BASE_URL and POME_LLM_MODEL alongside it)
The CLI auto-detects ANTHROPIC_API_KEY and OPENAI_API_KEY. If you use a different provider, set all three POME_LLM_* variables.
Exit codes from pome run: 0 means all scenarios met the pass threshold, 1 means at least one scenario scored below the threshold, and 2 or higher indicates an infrastructure error (twin boot failure, auth error, quota exceeded). Your CI step will fail automatically on any non-zero exit.

Running locally before pushing

Before committing a new scenario, run it locally to check it passes:
pome run scenarios/ --agent "bun run start"
This uses the same CLI the action runs, against a local twin. Local runs write traces to runs/ so you can inspect them with pome inspect latest before pushing.