Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pome.sh/llms.txt

Use this file to discover all available pages before exploring further.

A Pome scenario is a plain Markdown file. It is human-readable enough for a product manager to review and machine-parseable enough for the Pome CLI to execute without any configuration beyond the file itself. Each scenario defines the world state your agent starts from, the task it should perform, and the criteria Pome uses to decide whether it passed.

Anatomy of a scenario file

# Triage open issues in acme/api

## Setup
A repository acme/api with labels bug, feature, and question. Collaborators
alice (owns orders) and bob (owns auth). Open issue #1: "500 error on
POST /orders after deploy".

```json
{
  "repositories": [
    {
      "owner": "acme",
      "name": "api",
      "labels": [
        { "name": "bug",      "color": "d73a4a" },
        { "name": "feature",  "color": "0e8a16" },
        { "name": "question", "color": "cccccc" }
      ],
      "collaborators": ["alice", "bob"],
      "issues": [
        {
          "number": 1,
          "title": "500 error on POST /orders after deploy",
          "body": "Started failing right after the 14:00 deploy. Stack trace points to OrderController#create.",
          "state": "open",
          "labels": [],
          "assignees": []
        }
      ]
    }
  ]
}
```

## Prompt
Triage issue #1 in acme/api.

## Expected Behavior
The agent reads the issue, recognizes it as a bug, applies the bug label,
assigns alice (orders area), and stops.

## Success Criteria
- [D] Issue #1 has the `bug` label applied
- [D] Issue #1 is assigned to `alice`
- [D] No new labels were created
- [P] The agent's final summary mentions "bug" and "alice"

## Config
twins: github
timeout: 60
runs: 1
judge: claude-haiku-4-5

Sections

SectionRequiredPurpose
# TitleYesBecomes the scenario name in run artifacts and the dashboard.
## SetupNoSeeds the twin’s SQLite state before the agent runs. The JSON block is what the parser loads; the surrounding prose is for humans only.
## PromptYesThe task handed to your agent via the POME_TASK environment variable.
## Expected BehaviorNoHuman-readable description for reviewers. Never sent to the agent; available to the evaluator as context.
## Success CriteriaYes (≥ 1)One criterion per line, prefixed with [D] or [P].
## ConfigNoPer-scenario settings as bare key: value pairs or a fenced YAML block.

Criteria types

[D] — Deterministic

Deterministic criteria are evaluated by querying the twin’s final SQLite state directly. No LLM call is made. The CLI matches the criterion text against a library of SQL predicates:
  • Issue #N has the <label> label applied → queries issue_labels
  • Issue #N is assigned to <user> → queries issue_assignees
  • No new labels were created → diffs the labels table between seed and final state
Deterministic checks are instant, free, and perfectly reproducible across runs.

[P] — Probabilistic

Probabilistic criteria are evaluated by an LLM judge. You supply your own key (BYOK) — Pome never holds or proxies your LLM credentials. The judge receives the criterion text, the final twin state, and the agent’s stdout, and returns a passed / confidence / reasoning verdict. If a [D] criterion’s text doesn’t match any known SQL pattern, the CLI automatically falls back to [P] mode and logs a notice so you can sharpen the wording.

Config keys

KeyTypeDefaultNotes
twinsstring or string[]inferred from .pome.jsonWhich twin(s) this scenario requires.
timeoutnumber (seconds)180Per-run timeout before the agent is killed.
runsnumber1How many times to repeat the scenario. Useful for building confidence in [P] criteria.
seedstring"default"Named seed variant. "default" uses the JSON in ## Setup.
judgestringenv defaultJudge model identifier passed as-is to your configured LLM endpoint (e.g., gpt-4o-mini, claude-haiku-4-5, anthropic/claude-haiku-4-5 for OpenRouter).
tagsstring[][]Arbitrary tags for filtering runs with --tag.
Use [D] criteria wherever the success condition can be expressed as a state fact. They are faster, cheaper, and deterministic — a [D] check costs nothing and never flips between runs with the same final state.

Next steps