A Pome scenario is a plain Markdown file. It is human-readable enough for a product manager to review and machine-parseable enough for the Pome CLI to execute without any configuration beyond the file itself. Each scenario defines the world state your agent starts from, the task it should perform, and the criteria Pome uses to decide whether it passed.Documentation Index
Fetch the complete documentation index at: https://docs.pome.sh/llms.txt
Use this file to discover all available pages before exploring further.
Anatomy of a scenario file
Sections
| Section | Required | Purpose |
|---|---|---|
# Title | Yes | Becomes the scenario name in run artifacts and the dashboard. |
## Setup | No | Seeds the twin’s SQLite state before the agent runs. The JSON block is what the parser loads; the surrounding prose is for humans only. |
## Prompt | Yes | The task handed to your agent via the POME_TASK environment variable. |
## Expected Behavior | No | Human-readable description for reviewers. Never sent to the agent; available to the evaluator as context. |
## Success Criteria | Yes (≥ 1) | One criterion per line, prefixed with [D] or [P]. |
## Config | No | Per-scenario settings as bare key: value pairs or a fenced YAML block. |
Criteria types
[D] — Deterministic
Deterministic criteria are evaluated by querying the twin’s final SQLite state directly. No LLM call is made. The CLI matches the criterion text against a library of SQL predicates:
Issue #N has the <label> label applied→ queriesissue_labelsIssue #N is assigned to <user>→ queriesissue_assigneesNo new labels were created→ diffs thelabelstable between seed and final state
[P] — Probabilistic
Probabilistic criteria are evaluated by an LLM judge. You supply your own key (BYOK) — Pome never holds or proxies your LLM credentials. The judge receives the criterion text, the final twin state, and the agent’s stdout, and returns a passed / confidence / reasoning verdict.
If a [D] criterion’s text doesn’t match any known SQL pattern, the CLI automatically falls back to [P] mode and logs a notice so you can sharpen the wording.
Config keys
| Key | Type | Default | Notes |
|---|---|---|---|
twins | string or string[] | inferred from .pome.json | Which twin(s) this scenario requires. |
timeout | number (seconds) | 180 | Per-run timeout before the agent is killed. |
runs | number | 1 | How many times to repeat the scenario. Useful for building confidence in [P] criteria. |
seed | string | "default" | Named seed variant. "default" uses the JSON in ## Setup. |
judge | string | env default | Judge model identifier passed as-is to your configured LLM endpoint (e.g., gpt-4o-mini, claude-haiku-4-5, anthropic/claude-haiku-4-5 for OpenRouter). |
tags | string[] | [] | Arbitrary tags for filtering runs with --tag. |
Next steps
- Write your first scenario — step-by-step guide with working examples