Skip to main content
A twin is a deterministic, resettable, in-process simulation of a real SaaS API—the same response shapes, status codes, and error semantics as the live service. Your agent runs against the twin, Pome records every tool call, and the judge scores the run against the scenario’s acceptance criteria. Pome ships three twins today: GitHub, Stripe, and Slack. Each twin page covers its API surface, env vars, seed shapes, and the bundled scenario catalog for that twin.

Available twins

Twinpome run configStandalone pome twin startPage
GitHubtwins: ["github"]YesGitHub twin
Stripetwins: ["stripe"]No (in-process via pome run)Stripe twin
Slacktwins: ["slack"]No (in-process via pome run)Slack twin
Hosted sessions on app.pome.sh spawn the same twin packages inside per-session sandboxes. GitHub and Stripe are available via pome session create; Slack is local pome run today.

How scenarios work

A scenario is a markdown file with three parts:
  1. Seed state — the twin state before the agent runs (## Seed State)
  2. Prompt — what the agent is asked to do (## Prompt)
  3. Acceptance criteria — deterministic [D] and probabilistic [P] checks
Set the target twin in ## Config:
twins: ["github"]   # or stripe, slack
timeout: 60
passThreshold: 100

Run a scenario

pome run scenarios/<scenario>.md --agent "<your agent command>"
During the run, Pome boots the matching twin on a localhost port and injects POME_<TWIN>_REST_URL, POME_<TWIN>_MCP_URL, and POME_AUTH_TOKEN into the agent process.

GitHub twin

Issues, PRs, labels, CI status, and 10 bundled scenarios.

Stripe twin

PaymentIntents, refunds, x402, and 6 bundled scenarios.

Slack twin

Channels, messaging, and 2 bundled scenarios.