Write your first Pome scenario

A scenario is the core unit in Pome — each file defines one test your agent must pass. It combines a seed state (the starting world), a task prompt (what the agent is told to do), and success criteria (how Pome scores the result). You write scenarios in plain Markdown; Pome parses them, seeds a fresh twin, runs your agent, and scores the outcome.

Create the file

Create a new Markdown file in your scenarios/ directory. The filename becomes the scenario slug in traces and run output.

touch scenarios/01-bug-triage.md

Open the file and add a top-level heading. This becomes the scenario name shown in the CLI output and dashboard.

# Bug triage — happy path

Add a Setup section

The ## Setup section seeds the twin’s initial state. Write a short prose description for human readers, then include a fenced JSON block with the exact seed data. Pome uses only the JSON block at runtime; the prose is ignored by the parser.

## Setup

A repository `acme/api` with labels `bug`, `feature`, and `question` already
created. Collaborators: `alice` (owns orders) and `bob` (owns auth). One open
issue describing a 500 error after a deploy.

```json
{
  "repositories": [
    {
      "owner": "acme",
      "name": "api",
      "description": "API server",
      "labels": [
        { "name": "bug", "color": "d73a4a", "description": "" },
        { "name": "feature", "color": "0e8a16", "description": "" },
        { "name": "question", "color": "cccccc", "description": "" }
      ],
      "collaborators": ["alice", "bob"],
      "issues": [
        {
          "number": 1,
          "title": "500 error on POST /orders after deploy",
          "body": "Started failing right after the 14:00 deploy. Stack trace points to OrderController#create.",
          "state": "open",
          "labels": [],
          "assignees": []
        }
      ]
    }
  ]
}
```

Add a Prompt section

The ## Prompt section contains the task text Pome passes to your agent via the POME_TASK environment variable. Write it as you would write a user message to the agent.

## Prompt

Triage issue #1 in acme/api. Read the issue, classify it as bug, feature, or
question, apply the matching label, and assign the right collaborator (alice
owns orders, bob owns auth).

Add Success Criteria

The ## Success Criteria section lists what the agent must achieve. Each line starts with - [D] for a deterministic check or - [P] for a probabilistic (LLM-judged) check.

## Success Criteria

- [D] Issue #1 has the `bug` label applied
- [D] Issue #1 is assigned to `alice`
- [D] No new labels were created
- [P] The agent's final summary mentions classifying as "bug" and assigning "alice"

[D] criteria are evaluated by querying the twin’s SQLite state directly — no LLM call required. [P] criteria are evaluated by an LLM judge that reads the final state and the agent’s output.

Add a Config section

The optional ## Config section sets per-scenario runtime options as bare key-value pairs.

## Config

twins: github
timeout: 90
runs: 1

Key	Purpose
`twins`	Which twin(s) to boot for this scenario
`timeout`	Per-run timeout in seconds (default: 180)
`runs`	How many times to repeat the scenario (useful for `[P]` confidence)

Run it

Pass the scenario file and your agent command to pome run:

pome run scenarios/01-bug-triage.md --agent "bun run start"

To run every scenario in a directory at once:

pome run scenarios/ --agent "bun run start"

Pome boots a fresh twin, seeds it from your ## Setup block, runs the agent, scores the criteria, and writes trace artifacts to runs/.

[D] criteria phrase library

The [D] evaluator matches criterion text against a small phrase library. Use these exact phrasings to get deterministic, LLM-free scoring:

Pattern	What it checks
`Issue #N has the <label> label applied`	Queries `issue_labels` for a matching row
`Issue #N is assigned to <user>`	Queries `issue_assignees` for a matching row
`Issue #N still has the <label> label`	Checks the label is present in the final state (no-change variant)
`No new labels were created`	Diffs the `labels` table between seed state and final state

Wrap label names and usernames in backticks in the criterion text — for example:

- [D] Issue #1 has the `bug` label applied
- [D] Issue #1 is assigned to `alice`

Use [D] criteria whenever possible. They run without any LLM call, cost nothing, and produce the same result every time. Reserve [P] for things that are genuinely hard to express as a state assertion — for example, checking the quality or tone of a comment the agent posted.

If a [D] criterion text doesn’t match any phrase in the library, Pome automatically falls back to [P] LLM-judge mode and logs [D] criterion fell back to [P] mode. You can sharpen the wording to match a supported pattern, or leave it as [P] intentionally.

Complete scenario example

Here is a full scenario file you can use as a starting point:

# Bug triage — happy path

## Setup

A repository `acme/api` with labels `bug`, `feature`, and `question` already
created. Collaborators: `alice` (owns orders) and `bob` (owns auth). One open
issue describing a production 500 error.

```json
{
  "repositories": [
    {
      "owner": "acme",
      "name": "api",
      "description": "API server",
      "labels": [
        { "name": "bug", "color": "d73a4a", "description": "" },
        { "name": "feature", "color": "0e8a16", "description": "" },
        { "name": "question", "color": "cccccc", "description": "" }
      ],
      "collaborators": ["alice", "bob"],
      "issues": [
        {
          "number": 1,
          "title": "500 error on POST /orders after deploy",
          "body": "Started failing right after the 14:00 deploy. Stack trace points to OrderController#create.",
          "state": "open",
          "labels": [],
          "assignees": []
        }
      ]
    }
  ]
}
```

## Prompt

Triage issue #1 in acme/api. Read the issue, classify it as bug, feature, or
question, apply the matching label, and assign the right collaborator (alice
owns orders, bob owns auth).

## Success Criteria

- [D] Issue #1 has the `bug` label applied
- [D] Issue #1 is assigned to `alice`
- [D] No new labels were created
- [P] The agent's final summary mentions classifying as "bug" and assigning "alice"

## Config

twins: github
timeout: 90
runs: 1

Get Started

Core Concepts

Guides

CLI Reference

Plans & Pricing

Write your first Pome scenario

[D] criteria phrase library

Complete scenario example

Get Started

Core Concepts

Guides

CLI Reference

Plans & Pricing

Documentation Index

​[D] criteria phrase library

​Complete scenario example

[D] criteria phrase library

Complete scenario example