Pome runs your AI agent against deterministic clones of real SaaS APIs (GitHub, Stripe, …), records every tool call, and scores the result. No network, no rate limits, no flake.Documentation Index
Fetch the complete documentation index at: https://docs.pome.sh/llms.txt
Use this file to discover all available pages before exploring further.
1. Install
2. Log in
pme_… key gets written to ~/.pome/credentials.json. CI can skip this and set POME_API_KEY directly.
3. Scaffold a project
scenarios/, examples/agents/, a pome.config.json, and a few starter scenarios for the GitHub twin.
4. Run your first twin
The scripted starter agent needs no LLM key — it’s the fastest path to a green run.PASS 100/100. Pome spun up a local GitHub twin, ran the agent against it, and scored the final state.
To run against a real cloud twin instead — same code, dashboard-visible result:
5. Inspect the trace
runs/<scenario>/<run-id>/:
tool_calls.jsonl— every API call the agent made, in orderscore.json— which criteria passed and failedstate_final.json— what the twin looked like when the agent stopped
Point your own agent at a twin
Already have an agent? Start a standalone twin and point your client at it:gh, curl) that respects a base URL and bearer token will work unchanged.
Use cases
PR review agent
Build an agent that reads diffs, leaves review comments, and requests changes — without ever touching a real repo.
Runnable example →
Triage bot
Classify incoming issues, apply labels, assign owners. Scored against deterministic expected state.
Runnable example →
Refund / failed-charge bot
Handle Stripe
402s and refund retries against the Stripe twin — no test keys, no test customers.
Runnable example →Browse all twins
GitHub and Stripe today, more in closed beta.