A scenario is the core unit in Pome — each file defines one test your agent must pass. It combines a seed state (the starting world), a task prompt (what the agent is told to do), and success criteria (how Pome scores the result). You write scenarios in plain Markdown; Pome parses them, seeds a fresh twin, runs your agent, and scores the outcome.Documentation Index
Fetch the complete documentation index at: https://docs.pome.sh/llms.txt
Use this file to discover all available pages before exploring further.
Create the file
Create a new Markdown file in your Open the file and add a top-level heading. This becomes the scenario name shown in the CLI output and dashboard.
scenarios/ directory. The filename becomes the scenario slug in traces and run output.Add a Setup section
The
## Setup section seeds the twin’s initial state. Write a short prose description for human readers, then include a fenced JSON block with the exact seed data. Pome uses only the JSON block at runtime; the prose is ignored by the parser.Add a Prompt section
The
## Prompt section contains the task text Pome passes to your agent via the POME_TASK environment variable. Write it as you would write a user message to the agent.Add Success Criteria
The
## Success Criteria section lists what the agent must achieve. Each line starts with - [D] for a deterministic check or - [P] for a probabilistic (LLM-judged) check.[D] criteria are evaluated by querying the twin’s SQLite state directly — no LLM call required. [P] criteria are evaluated by an LLM judge that reads the final state and the agent’s output.Add a Config section
The optional
## Config section sets per-scenario runtime options as bare key-value pairs.| Key | Purpose |
|---|---|
twins | Which twin(s) to boot for this scenario |
timeout | Per-run timeout in seconds (default: 180) |
runs | How many times to repeat the scenario (useful for [P] confidence) |
[D] criteria phrase library
The[D] evaluator matches criterion text against a small phrase library. Use these exact phrasings to get deterministic, LLM-free scoring:
| Pattern | What it checks |
|---|---|
Issue #N has the <label> label applied | Queries issue_labels for a matching row |
Issue #N is assigned to <user> | Queries issue_assignees for a matching row |
Issue #N still has the <label> label | Checks the label is present in the final state (no-change variant) |
No new labels were created | Diffs the labels table between seed state and final state |
If a
[D] criterion text doesn’t match any phrase in the library, Pome automatically falls back to [P] LLM-judge mode and logs [D] criterion fell back to [P] mode. You can sharpen the wording to match a supported pattern, or leave it as [P] intentionally.