Pome is a digital-twin testing platform for AI agents. Instead of running your agent against live GitHub — where rate limits bite, state is impossible to reset, and one noisy collaborator can break your eval — you boot a local GitHub-shaped twin, point your agent at it, run a scenario, score the result deterministically, and reset for the next run. The twin speaks GitHub’s actual REST shape and exposes 35 MCP tools, so your agent code doesn’t change between testing and production.Documentation Index
Fetch the complete documentation index at: https://docs.pome.sh/llms.txt
Use this file to discover all available pages before exploring further.
Why Pome
Evaluating AI agents against live APIs is unreliable in three specific ways: the API drifts between runs, rate limits interrupt long eval suites, and you can’t reset state cleanly between test cases. A shared sandbox account makes things worse — concurrent runs stomp on each other, and a label left over from yesterday’s run breaks today’s assertion.Pome solves all three problems with a single primitive: a resettable twin. Every scenario run starts from a known seed state you define. The twin enforces GitHub’s real invariants (try to apply a label that doesn’t exist and you get a real
422), so your evaluation is honest. When the run finishes, reset and go again — no cleanup scripts, no shared state, no waiting for rate limits to clear.Key features
GitHub-shaped twin (REST + MCP)
A real HTTP server with GitHub-shaped REST endpoints and 35 MCP tools, backed by SQLite. Unsupported routes return a loud
501 — no silent green.Markdown scenarios
Define your agent’s task, seed state, and pass/fail criteria in a single Markdown file. No code required to write a test case.
Deterministic evaluation
Criteria are evaluated against the final twin state — not LLM output. A label either exists or it doesn’t. Scores are reproducible across runs and machines.
CI integration
Drop Pome into GitHub Actions with one workflow step. The official action installs the CLI, runs your scenarios, and uploads the trace as an artifact.
Two ways to run Pome
- Self-host (free, OSS)
- Hosted (managed)
Run the full Pome stack on your own machine or CI runner using Docker Compose. No account, no API key, no network dependency. Everything runs locally — the twin, the evaluator, and the trace artifacts.Pome is open-source under AGPL-3.0. Internal use is free. If you modify Pome and serve users over a network, AGPL §13 requires you to publish your modifications.Best for: local development, offline environments, teams who prefer to own their infrastructure.
Next steps
Ready to run your first scenario? The quickstart walks you through booting a twin, running the bundled triage agent, and inspecting your first scored run — in under 60 seconds.Quickstart
Boot the twin, run the bundled agent, and see your first score.
Core concepts
Understand twins, scenarios, sessions, and evaluation before you build.