Patronus AI lands $50M to build digital worlds that stress-test AI agents
Patronus AI, a startup creating simulated digital environments to evaluate AI agents, has raised $50 million in Series B funding, bringing total funding to $70 million.

AI agents are becoming more sophisticated, evolving from answering questions to autonomously executing complex multi-step tasks. But before these agents can be trusted to book trips or perform financial analysis, model providers and startups need to ensure they perform reliably across a vast range of scenarios. Patronus AI, founded in 2023 by former Meta researchers Anand Kannappan and Rebecca Qian, helps model makers and companies fine-tune their models by building simulated digital environments to evaluate agent performance.
The San Francisco-based startup has seen significant investor interest. According to Glenn Solomon, a managing director at Notable Capital, demand for the company's simulated environments is nearly insatiable. Revenue has grown 15-fold over the past year. On Thursday, the company announced a $50 million Series B round led by Greenfield Partners, with participation from Notable Capital, Lightspeed, Datadog, and Samsung. This brings total funding to $70 million.
Patronus uses what it calls 'digital world models' to create replicas of websites and internal systems. In these environments, agents are stress-tested after training using reinforcement learning, which iteratively rewards successful task completion and penalizes errors. AI labs value these simulations because they allow agents to try unpredictable scenarios. The company compares its approach to how Waymo trained autonomous cars by first building synthetic worlds to test vehicles against rare hazards. The difference is that AI agents tend to take shortcuts, causing them to fail tasks correctly. 'Patronus is really good at spotting the hacks and making sure they are holding the models accountable,' Solomon said.
Currently, Patronus provides simulated digital worlds for software engineering and finance, but according to Kannappan, these are just the start. 'Today we're very focused on the problems that are verifiable, but there are a ton more areas that are very non-verifiable or very hard to verify,' he said. 'We want to be able to actually create the environment in which you can operate an agent that can run for 10 hours or 10 days or 10 weeks.' As for competitors, Patronus primarily competes against internal teams at AI labs that evaluate agent behavior. Unlike human-data firms like Mercor and Surge, Patronus evaluates agents without any human involvement.


