/pome-test runs the scenarios listed in your repo’s TESTS.md against pome’s
hosted twins, reports pass/fail per scenario, and points you at the dashboard for
traces and judge feedback.
Use it after /pome-setup has wired the project, or whenever you want to re-run
evals after changing prompts, tools, or models.
Install the skill
/pome-setup and /pome-test. See /setup for flags.
When to use
- Re-running your agent’s scenario suite after a code or prompt change.
- Validating that a new scenario passes before committing it.
- Getting a quick pass/fail summary without manually calling
pome runfor each file.
pome.config.json or TESTS.md is missing, run /setup first.
Invoke
What it does
- Reads
TESTS.md— collects every.mdscenario path listed under## Scenarios. - Confirms scope — prints the planned runs and waits for your OK before spending hosted credits.
- Runs each scenario — calls
pome run <path>for every entry. Pome handles twin routing, recording, and scoring. - Reports results — prints
PASS/FAIL, score, local artifact path, and cloud dashboard URL per scenario. - Surfaces failures — for failing runs, includes judge handoff text you can paste back into your coding agent as a fix prompt.
Output contract
- Each scenario produces a run on
app.pome.sh. - A pass/fail summary appears in the chat.
- Full traces live at
runs/<scenario>/<run-id>/locally.
Exit codes
pome run exit codes the skill interprets:
| Code | Meaning |
|---|---|
| 0 | Passed (score ≥ passThreshold) |
| 1 | Scored below threshold |
| 2 | Runner or twin error |
| 3 | Auth error — re-run pome login |
| 4 | Quota exceeded |
| 5 | Usage error (bad flags, missing files) |
Browse and add scenarios
TESTS.md:
Next
/setup
First-time wire-up with
/pome-setup.Dashboard
Inspect traces and judge feedback for each run.
pome run
Run scenarios directly from the CLI.
GitHub twin
Scenarios and twin reference for GitHub agents.