/test-with-pome

/pome-test runs the scenarios listed in your repo’s TESTS.md against pome’s hosted twins, reports pass/fail per scenario, and points you at the dashboard for traces and judge feedback. Use it after /pome-setup has wired the project, or whenever you want to re-run evals after changing prompts, tools, or models.

Install the skill

pome skills install

Installs both /pome-setup and /pome-test. See /setup for flags.

When to use

Re-running your agent’s scenario suite after a code or prompt change.
Validating that a new scenario passes before committing it.
Getting a quick pass/fail summary without manually calling pome run for each file.

If pome.config.json or TESTS.md is missing, run /setup first.

Invoke

test my agents with pome

Or explicitly:

use /pome-test

What it does

Reads TESTS.md — collects every .md scenario path listed under ## Scenarios.
Confirms scope — prints the planned runs and waits for your OK before spending hosted credits.
Runs each scenario — calls pome run <path> for every entry. Pome handles twin routing, recording, and scoring.
Reports results — prints PASS/FAIL, score, local artifact path, and cloud dashboard URL per scenario.
Surfaces failures — for failing runs, includes judge handoff text you can paste back into your coding agent as a fix prompt.

The skill also adds matching scenarios from the bundled library when your agent talks to a service pome ships twins for.

Output contract

Each scenario produces a run on app.pome.sh.
A pass/fail summary appears in the chat.
Full traces live at runs/<scenario>/<run-id>/ locally.

Exit codes

pome run exit codes the skill interprets:

Code	Meaning
0	Passed (score ≥ passThreshold)
1	Scored below threshold
2	Runner or twin error
3	Auth error — re-run `pome login`
4	Quota exceeded
5	Usage error (bad flags, missing files)

Browse and add scenarios

pome scenarios github
pome scenarios github --copy
pome scenarios stripe --copy

Add paths to TESTS.md:

## Scenarios

- scenarios/01-bug-happy-path.md
- scenarios/10-stripe-create-payment-intent.md

See per-twin catalogs on the GitHub, Stripe, and Slack reference pages.

/setup

First-time wire-up with /pome-setup.

Dashboard

Inspect traces and judge feedback for each run.

pome run

Run scenarios directly from the CLI.

GitHub twin

Scenarios and twin reference for GitHub agents.

​Install the skill

​When to use

​Invoke

​What it does

​Output contract

​Exit codes

​Browse and add scenarios

​Next