Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pome.sh/llms.txt

Use this file to discover all available pages before exploring further.

Pome is a digital-twin testing platform for AI agents. Instead of running your agent against live GitHub — where rate limits bite, state is impossible to reset, and one noisy collaborator can break your eval — you boot a local GitHub-shaped twin, point your agent at it, run a scenario, score the result deterministically, and reset for the next run. The twin speaks GitHub’s actual REST shape and exposes 35 MCP tools, so your agent code doesn’t change between testing and production.

Why Pome

Evaluating AI agents against live APIs is unreliable in three specific ways: the API drifts between runs, rate limits interrupt long eval suites, and you can’t reset state cleanly between test cases. A shared sandbox account makes things worse — concurrent runs stomp on each other, and a label left over from yesterday’s run breaks today’s assertion.
Pome solves all three problems with a single primitive: a resettable twin. Every scenario run starts from a known seed state you define. The twin enforces GitHub’s real invariants (try to apply a label that doesn’t exist and you get a real 422), so your evaluation is honest. When the run finishes, reset and go again — no cleanup scripts, no shared state, no waiting for rate limits to clear.

Key features

GitHub-shaped twin (REST + MCP)

A real HTTP server with GitHub-shaped REST endpoints and 35 MCP tools, backed by SQLite. Unsupported routes return a loud 501 — no silent green.

Markdown scenarios

Define your agent’s task, seed state, and pass/fail criteria in a single Markdown file. No code required to write a test case.

Deterministic evaluation

Criteria are evaluated against the final twin state — not LLM output. A label either exists or it doesn’t. Scores are reproducible across runs and machines.

CI integration

Drop Pome into GitHub Actions with one workflow step. The official action installs the CLI, runs your scenarios, and uploads the trace as an artifact.

Two ways to run Pome

Run the full Pome stack on your own machine or CI runner using Docker Compose. No account, no API key, no network dependency. Everything runs locally — the twin, the evaluator, and the trace artifacts.
git clone https://github.com/pome-sh/pome.git && cd pome
docker compose up -d
curl http://127.0.0.1:3333/healthz  # → {"status":"ok"}
Pome is open-source under AGPL-3.0. Internal use is free. If you modify Pome and serve users over a network, AGPL §13 requires you to publish your modifications.Best for: local development, offline environments, teams who prefer to own their infrastructure.

Next steps

Ready to run your first scenario? The quickstart walks you through booting a twin, running the bundled triage agent, and inspecting your first scored run — in under 60 seconds.

Quickstart

Boot the twin, run the bundled agent, and see your first score.

Core concepts

Understand twins, scenarios, sessions, and evaluation before you build.