Introduction
Pome is a simulation testing infrastructure for AI agents and ML models. It allows you to safely test and validate agent behavior before deploying to real users. With deterministic simulations, Pome provides the deep visibility required to transition autonomous agents from experimental prototypes to production-ready tools.Simulated Evaluations
To evaluate how agents interact with real-world ecosystems, Pome uses stateful digital twins—sandboxed emulations of live third-party APIs. These twins answer the exact same REST and MCP calls as production systems, allowing you to run comprehensive end-to-end tests, evaluate agent behavior, or train ML models in a completely safe environment. During these evaluations, Pome automatically runs scenarios to identify where agents fail and how they can be optimized. A scenario is defined by three key inputs:- Starting world state — The twin’s seeded baseline configuration (e.g., existing repositories, charges, or communication channels).
- Task to complete — The specific prompt or objective the agent receives.
- Connected services — The external API twins (GitHub, Stripe, Slack, etc.) the agent must interact with.
How to start
Quickstart
Install, log in, install skills, and complete a first scored run in minutes.
How Pome works
Twins, scenarios, runs, scoring, and artifacts explained end to end.
/setup
Wire your coding agent to pome with
/pome-setup.