Pome: test AI agents against a GitHub-shaped twin

Pome is a digital-twin testing platform for AI agents. Instead of running your agent against live GitHub — where rate limits bite, state is impossible to reset, and one noisy collaborator can break your eval — you boot a local GitHub-shaped twin, point your agent at it, run a scenario, score the result deterministically, and reset for the next run. The twin speaks GitHub’s actual REST shape and exposes 35 MCP tools, so your agent code doesn’t change between testing and production.

Why Pome

Evaluating AI agents against live APIs is unreliable in three specific ways: the API drifts between runs, rate limits interrupt long eval suites, and you can’t reset state cleanly between test cases. A shared sandbox account makes things worse — concurrent runs stomp on each other, and a label left over from yesterday’s run breaks today’s assertion.

Pome solves all three problems with a single primitive: a resettable twin. Every scenario run starts from a known seed state you define. The twin enforces GitHub’s real invariants (try to apply a label that doesn’t exist and you get a real 422), so your evaluation is honest. When the run finishes, reset and go again — no cleanup scripts, no shared state, no waiting for rate limits to clear.

Key features

GitHub-shaped twin (REST + MCP)

A real HTTP server with GitHub-shaped REST endpoints and 35 MCP tools, backed by SQLite. Unsupported routes return a loud 501 — no silent green.

Markdown scenarios

Define your agent’s task, seed state, and pass/fail criteria in a single Markdown file. No code required to write a test case.

Deterministic evaluation

Criteria are evaluated against the final twin state — not LLM output. A label either exists or it doesn’t. Scores are reproducible across runs and machines.

CI integration

Drop Pome into GitHub Actions with one workflow step. The official action installs the CLI, runs your scenarios, and uploads the trace as an artifact.

Two ways to run Pome

Self-host (free, OSS)
Hosted (managed)

Run the full Pome stack on your own machine or CI runner using Docker Compose. No account, no API key, no network dependency. Everything runs locally — the twin, the evaluator, and the trace artifacts.

git clone https://github.com/pome-sh/pome.git && cd pome
docker compose up -d
curl http://127.0.0.1:3333/healthz  # → {"status":"ok"}

Pome is open-source under AGPL-3.0. Internal use is free. If you modify Pome and serve users over a network, AGPL §13 requires you to publish your modifications.Best for: local development, offline environments, teams who prefer to own their infrastructure.

Use the Pome hosted control plane at api.pome.sh. Pome provisions twin sessions on demand, stores your run traces and scores, and gives you a dashboard to inspect results. Your agent talks to a twin URL that Pome manages; you bring your own LLM key for the evaluator.

pome login
pome run scenarios/triage-agent.md

The hosted plan includes a free tier. See Plans & Pricing for session quotas and Pro limits.Best for: teams who want managed infrastructure, shared run history, and CI without running Docker in every workflow.

Next steps

Ready to run your first scenario? The quickstart walks you through booting a twin, running the bundled triage agent, and inspecting your first scored run — in under 60 seconds.

Quickstart

Boot the twin, run the bundled agent, and see your first score.

Core concepts

Understand twins, scenarios, sessions, and evaluation before you build.

Get Started

Core Concepts

Guides

CLI Reference

Plans & Pricing

Pome: test AI agents against a GitHub-shaped twin

Why Pome

Key features

GitHub-shaped twin (REST + MCP)

Markdown scenarios

Deterministic evaluation

CI integration

Two ways to run Pome

Next steps

Quickstart

Core concepts

Get Started

Core Concepts

Guides

CLI Reference

Plans & Pricing

Documentation Index

​Why Pome

​Key features

GitHub-shaped twin (REST + MCP)

Markdown scenarios

Deterministic evaluation

CI integration

​Two ways to run Pome

​Next steps

Quickstart

Core concepts

Why Pome

Key features

Two ways to run Pome

Next steps