The loop
Twins
A twin is an in-process (or hosted) simulation of a real SaaS API. It preserves response shapes and error semantics without calling the live service.| Twin | Config | What agents exercise |
|---|---|---|
| GitHub | twins: ["github"] | Issues, PRs, labels, comments, commit status |
| Stripe | twins: ["stripe"] | PaymentIntents, refunds, x402 paywalls |
| Slack | twins: ["slack"] | Channels, messages, threads, reactions |
Scenarios
A scenario is a markdown file with three parts:## Seed State— JSON the twin loads before the agent starts.## Prompt— passed to the agent asPOME_TASK.- Acceptance criteria —
[D]deterministic checks on twin state or events;[P]probabilistic checks judged by an LLM.
Runs
pome run <scenario.md> boots the matching twin, spawns your agent with injected
environment variables (POME_<TWIN>_REST_URL, POME_AUTH_TOKEN, etc.), waits for
the agent to exit, then scores the result.
Hosted is the default — runs record to app.pome.sh. Set
POME_LOCAL=1 for engineer-only in-process local twins.
Artifacts
Every run writes a directory atruns/<scenario>/<run-id>/:
| File | Contents |
|---|---|
events.jsonl | Canonical event stream — TwinHttpEvent, LlmCallEvent, RunStartedEvent, etc. |
score.json | Per-criterion verdicts and aggregate satisfaction (0–100). |
state_initial.json / state_final.json | Twin state before and after the agent. |
tool_calls.jsonl is also written for older clients. New integrations should read
events.jsonl.Scoring
[D]deterministic — Pome checks twin state or recorded events directly. Example: “issue #42 has labelbug” or “no message containing the secret appears in a public channel.”[P]probabilistic — An LLM judge reads the trace and evaluates judgment criteria. Example: “the agent recognized the label was contextually wrong.”
passThreshold (default 100).
When to run pome
Run pome before merging changes that affect agent behavior:- Prompt or system-instruction edits
- New or modified tools
- Model swaps
- Twin integration changes (new API surface your agent calls)
Next
Quickstart
Install and complete your first run.
Dashboard
Where runs, sessions, and judge feedback live.
pome run
CLI reference for running scenarios.
/test-with-pome
Run scenarios from your coding agent with
/pome-test.