In this quickstart you will boot a local GitHub-shaped twin with Docker, run the bundled triage agent against it using the Claude Agent SDK, and use the Pome CLI to inspect the scored result. By the end you’ll have seen the full Pome loop — seed state, agent run, deterministic evaluation — on your own machine.Documentation Index
Fetch the complete documentation index at: https://docs.pome.sh/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- Docker installed and running
- Bun version 1.3 or later (
bun --versionto check) - An Anthropic API key for the bundled triage agent
Clone and boot the twin
Clone the Pome repository and start the GitHub twin with Docker Compose.Once the container is up, confirm the twin is healthy:You should see:The twin is now listening on port
3333. It exposes GitHub-shaped REST endpoints and 35 MCP tools, backed by SQLite. The seed data — one open issue in acme/api — is loaded automatically.Run the bundled triage agent
Move into the triage agent example directory, install its dependencies, and run the agent against the twin.Export your Anthropic API key, then start the agent:The agent reads the seeded open issue in
acme/api, classifies it as bug, feature, or question, applies the matching label, and posts a one-sentence reasoning comment. You’ll see its tool calls printed to the terminal as it works.Inspect the run
Use the Pome CLI to inspect the scored result of the latest run.You’ll see output similar to this:Each line is a criterion from the scenario file.
[D] means the criterion was evaluated deterministically against the final twin state — no LLM judge involved. A score of 100/100 means every criterion passed.The full run trace — every HTTP call the agent made, request and response bodies, and the final state snapshot — is written to runs/<scenario>/<run-id>/ for offline inspection.What just happened
The triage agent read the open issue from the twin’s REST API, used the Claude Agent SDK to classify it, then called the twin’s MCP tools to apply a label and post a comment. Pome recorded every API call the agent made. When the agent finished, Pome evaluated the final twin state against the scenario’s success criteria and computed a deterministic score. Nothing touched real GitHub. The twin enforced real GitHub invariants throughout — if the agent had tried to apply a label that didn’t exist in the seed state, the twin would have returned a real422 Validation Failed response, just like GitHub would.
Next steps
Write your own scenarios
Define custom seed state and pass/fail criteria for your agent in a single Markdown file.
Twins in depth
Learn what the GitHub twin supports, what it intentionally omits, and how fidelity is enforced.
CI integration
Add Pome to your GitHub Actions workflow with a single step using the official action.