Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pome.sh/llms.txt

Use this file to discover all available pages before exploring further.

A scenario is the core unit in Pome — each file defines one test your agent must pass. It combines a seed state (the starting world), a task prompt (what the agent is told to do), and success criteria (how Pome scores the result). You write scenarios in plain Markdown; Pome parses them, seeds a fresh twin, runs your agent, and scores the outcome.
1

Create the file

Create a new Markdown file in your scenarios/ directory. The filename becomes the scenario slug in traces and run output.
touch scenarios/01-bug-triage.md
Open the file and add a top-level heading. This becomes the scenario name shown in the CLI output and dashboard.
# Bug triage — happy path
2

Add a Setup section

The ## Setup section seeds the twin’s initial state. Write a short prose description for human readers, then include a fenced JSON block with the exact seed data. Pome uses only the JSON block at runtime; the prose is ignored by the parser.
## Setup

A repository `acme/api` with labels `bug`, `feature`, and `question` already
created. Collaborators: `alice` (owns orders) and `bob` (owns auth). One open
issue describing a 500 error after a deploy.

```json
{
  "repositories": [
    {
      "owner": "acme",
      "name": "api",
      "description": "API server",
      "labels": [
        { "name": "bug", "color": "d73a4a", "description": "" },
        { "name": "feature", "color": "0e8a16", "description": "" },
        { "name": "question", "color": "cccccc", "description": "" }
      ],
      "collaborators": ["alice", "bob"],
      "issues": [
        {
          "number": 1,
          "title": "500 error on POST /orders after deploy",
          "body": "Started failing right after the 14:00 deploy. Stack trace points to OrderController#create.",
          "state": "open",
          "labels": [],
          "assignees": []
        }
      ]
    }
  ]
}
```
3

Add a Prompt section

The ## Prompt section contains the task text Pome passes to your agent via the POME_TASK environment variable. Write it as you would write a user message to the agent.
## Prompt

Triage issue #1 in acme/api. Read the issue, classify it as bug, feature, or
question, apply the matching label, and assign the right collaborator (alice
owns orders, bob owns auth).
4

Add Success Criteria

The ## Success Criteria section lists what the agent must achieve. Each line starts with - [D] for a deterministic check or - [P] for a probabilistic (LLM-judged) check.
## Success Criteria

- [D] Issue #1 has the `bug` label applied
- [D] Issue #1 is assigned to `alice`
- [D] No new labels were created
- [P] The agent's final summary mentions classifying as "bug" and assigning "alice"
[D] criteria are evaluated by querying the twin’s SQLite state directly — no LLM call required. [P] criteria are evaluated by an LLM judge that reads the final state and the agent’s output.
5

Add a Config section

The optional ## Config section sets per-scenario runtime options as bare key-value pairs.
## Config

twins: github
timeout: 90
runs: 1
KeyPurpose
twinsWhich twin(s) to boot for this scenario
timeoutPer-run timeout in seconds (default: 180)
runsHow many times to repeat the scenario (useful for [P] confidence)
6

Run it

Pass the scenario file and your agent command to pome run:
pome run scenarios/01-bug-triage.md --agent "bun run start"
To run every scenario in a directory at once:
pome run scenarios/ --agent "bun run start"
Pome boots a fresh twin, seeds it from your ## Setup block, runs the agent, scores the criteria, and writes trace artifacts to runs/.

[D] criteria phrase library

The [D] evaluator matches criterion text against a small phrase library. Use these exact phrasings to get deterministic, LLM-free scoring:
PatternWhat it checks
Issue #N has the <label> label appliedQueries issue_labels for a matching row
Issue #N is assigned to <user>Queries issue_assignees for a matching row
Issue #N still has the <label> labelChecks the label is present in the final state (no-change variant)
No new labels were createdDiffs the labels table between seed state and final state
Wrap label names and usernames in backticks in the criterion text — for example:
- [D] Issue #1 has the `bug` label applied
- [D] Issue #1 is assigned to `alice`
Use [D] criteria whenever possible. They run without any LLM call, cost nothing, and produce the same result every time. Reserve [P] for things that are genuinely hard to express as a state assertion — for example, checking the quality or tone of a comment the agent posted.
If a [D] criterion text doesn’t match any phrase in the library, Pome automatically falls back to [P] LLM-judge mode and logs [D] criterion fell back to [P] mode. You can sharpen the wording to match a supported pattern, or leave it as [P] intentionally.

Complete scenario example

Here is a full scenario file you can use as a starting point:
# Bug triage — happy path

## Setup

A repository `acme/api` with labels `bug`, `feature`, and `question` already
created. Collaborators: `alice` (owns orders) and `bob` (owns auth). One open
issue describing a production 500 error.

```json
{
  "repositories": [
    {
      "owner": "acme",
      "name": "api",
      "description": "API server",
      "labels": [
        { "name": "bug", "color": "d73a4a", "description": "" },
        { "name": "feature", "color": "0e8a16", "description": "" },
        { "name": "question", "color": "cccccc", "description": "" }
      ],
      "collaborators": ["alice", "bob"],
      "issues": [
        {
          "number": 1,
          "title": "500 error on POST /orders after deploy",
          "body": "Started failing right after the 14:00 deploy. Stack trace points to OrderController#create.",
          "state": "open",
          "labels": [],
          "assignees": []
        }
      ]
    }
  ]
}
```

## Prompt

Triage issue #1 in acme/api. Read the issue, classify it as bug, feature, or
question, apply the matching label, and assign the right collaborator (alice
owns orders, bob owns auth).

## Success Criteria

- [D] Issue #1 has the `bug` label applied
- [D] Issue #1 is assigned to `alice`
- [D] No new labels were created
- [P] The agent's final summary mentions classifying as "bug" and assigning "alice"

## Config

twins: github
timeout: 90
runs: 1