Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pome.sh/llms.txt

Use this file to discover all available pages before exploring further.

In this quickstart you will boot a local GitHub-shaped twin with Docker, run the bundled triage agent against it using the Claude Agent SDK, and use the Pome CLI to inspect the scored result. By the end you’ll have seen the full Pome loop — seed state, agent run, deterministic evaluation — on your own machine.
Prerequisites
  • Docker installed and running
  • Bun version 1.3 or later (bun --version to check)
  • An Anthropic API key for the bundled triage agent
1

Clone and boot the twin

Clone the Pome repository and start the GitHub twin with Docker Compose.
git clone https://github.com/pome-sh/pome.git && cd pome
docker compose up -d
Once the container is up, confirm the twin is healthy:
curl http://127.0.0.1:3333/healthz
You should see:
{"status":"ok"}
The twin is now listening on port 3333. It exposes GitHub-shaped REST endpoints and 35 MCP tools, backed by SQLite. The seed data — one open issue in acme/api — is loaded automatically.
2

Run the bundled triage agent

Move into the triage agent example directory, install its dependencies, and run the agent against the twin.
cd examples/triage-agent
bun install
Export your Anthropic API key, then start the agent:
export ANTHROPIC_API_KEY=sk-ant-...
bun run start
The agent reads the seeded open issue in acme/api, classifies it as bug, feature, or question, applies the matching label, and posts a one-sentence reasoning comment. You’ll see its tool calls printed to the terminal as it works.
3

Inspect the run

Use the Pome CLI to inspect the scored result of the latest run.
pome inspect latest
You’ll see output similar to this:
Score: 100/100
✓ [D] Issue #1 has the `bug` label applied
✓ [D] Issue #1 is assigned to `alice`
✓ [D] No unsupported endpoint was called
Each line is a criterion from the scenario file. [D] means the criterion was evaluated deterministically against the final twin state — no LLM judge involved. A score of 100/100 means every criterion passed.The full run trace — every HTTP call the agent made, request and response bodies, and the final state snapshot — is written to runs/<scenario>/<run-id>/ for offline inspection.

What just happened

The triage agent read the open issue from the twin’s REST API, used the Claude Agent SDK to classify it, then called the twin’s MCP tools to apply a label and post a comment. Pome recorded every API call the agent made. When the agent finished, Pome evaluated the final twin state against the scenario’s success criteria and computed a deterministic score. Nothing touched real GitHub. The twin enforced real GitHub invariants throughout — if the agent had tried to apply a label that didn’t exist in the seed state, the twin would have returned a real 422 Validation Failed response, just like GitHub would.

Next steps

Write your own scenarios

Define custom seed state and pass/fail criteria for your agent in a single Markdown file.

Twins in depth

Learn what the GitHub twin supports, what it intentionally omits, and how fidelity is enforced.

CI integration

Add Pome to your GitHub Actions workflow with a single step using the official action.