Skip to main content
/pome-test runs the scenarios listed in your repo’s TESTS.md against pome’s hosted twins, reports pass/fail per scenario, and points you at the dashboard for traces and judge feedback. Use it after /pome-setup has wired the project, or whenever you want to re-run evals after changing prompts, tools, or models.

Install the skill

pome skills install
Installs both /pome-setup and /pome-test. See /setup for flags.

When to use

  • Re-running your agent’s scenario suite after a code or prompt change.
  • Validating that a new scenario passes before committing it.
  • Getting a quick pass/fail summary without manually calling pome run for each file.
If pome.config.json or TESTS.md is missing, run /setup first.

Invoke

test my agents with pome
Or explicitly:
use /pome-test

What it does

  1. Reads TESTS.md — collects every .md scenario path listed under ## Scenarios.
  2. Confirms scope — prints the planned runs and waits for your OK before spending hosted credits.
  3. Runs each scenario — calls pome run <path> for every entry. Pome handles twin routing, recording, and scoring.
  4. Reports results — prints PASS/FAIL, score, local artifact path, and cloud dashboard URL per scenario.
  5. Surfaces failures — for failing runs, includes judge handoff text you can paste back into your coding agent as a fix prompt.
The skill also adds matching scenarios from the bundled library when your agent talks to a service pome ships twins for.

Output contract

  • Each scenario produces a run on app.pome.sh.
  • A pass/fail summary appears in the chat.
  • Full traces live at runs/<scenario>/<run-id>/ locally.

Exit codes

pome run exit codes the skill interprets:
CodeMeaning
0Passed (score ≥ passThreshold)
1Scored below threshold
2Runner or twin error
3Auth error — re-run pome login
4Quota exceeded
5Usage error (bad flags, missing files)

Browse and add scenarios

pome scenarios github
pome scenarios github --copy
pome scenarios stripe --copy
Add paths to TESTS.md:
## Scenarios

- scenarios/01-bug-happy-path.md
- scenarios/10-stripe-create-payment-intent.md
See per-twin catalogs on the GitHub, Stripe, and Slack reference pages.

Next

/setup

First-time wire-up with /pome-setup.

Dashboard

Inspect traces and judge feedback for each run.

pome run

Run scenarios directly from the CLI.

GitHub twin

Scenarios and twin reference for GitHub agents.