Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pome.sh/llms.txt

Use this file to discover all available pages before exploring further.

Pome runs your AI agent against deterministic clones of real SaaS APIs (GitHub, Stripe, …), records every tool call, and scores the result. No network, no rate limits, no flake.

1. Install

npm install -g pome-sh
pome version

2. Log in

pome login
A browser opens, you sign in, and a pme_… key gets written to ~/.pome/credentials.json. CI can skip this and set POME_API_KEY directly.

3. Scaffold a project

mkdir my-agent && cd my-agent
pome init
You now have scenarios/, examples/agents/, a pome.config.json, and a few starter scenarios for the GitHub twin.

4. Run your first twin

The scripted starter agent needs no LLM key — it’s the fastest path to a green run.
pome run scenarios/01-bug-happy-path.md \
  --agent "npx tsx examples/agents/scripted-triage-agent.ts"
You should see PASS 100/100. Pome spun up a local GitHub twin, ran the agent against it, and scored the final state. To run against a real cloud twin instead — same code, dashboard-visible result:
pome run scenarios/01-bug-happy-path.md --hosted \
  --agent "npx tsx examples/agents/scripted-triage-agent.ts"

5. Inspect the trace

pome inspect latest
The full trace lives under runs/<scenario>/<run-id>/:
  • tool_calls.jsonl — every API call the agent made, in order
  • score.json — which criteria passed and failed
  • state_final.json — what the twin looked like when the agent stopped
Hosted runs also show up at dashboard.pome.sh.

Point your own agent at a twin

Already have an agent? Start a standalone twin and point your client at it:
pome twin start github
# prints POME_GITHUB_REST_URL and POME_AUTH_TOKEN
Any GitHub client library (Octokit, gh, curl) that respects a base URL and bearer token will work unchanged.

Use cases

PR review agent

Build an agent that reads diffs, leaves review comments, and requests changes — without ever touching a real repo. Runnable example →

Triage bot

Classify incoming issues, apply labels, assign owners. Scored against deterministic expected state. Runnable example →

Refund / failed-charge bot

Handle Stripe 402s and refund retries against the Stripe twin — no test keys, no test customers. Runnable example →

Browse all twins

GitHub and Stripe today, more in closed beta.

Next

  • Twins — what each twin covers and how to drive it.
  • Changelog — what shipped recently.