Startup XDOF Raises $70M to Solve Robot Training Data Shortage
XDOF, a startup building data pipelines and annotation systems for robot training, has raised $70 million from top investors to address the critical lack of physical interaction data needed to teach robots real-world tasks.

Two weeks after OpenAI announced the revival of its robotics program, startup XDOF is emerging from stealth to tackle one of the biggest bottlenecks in AI development: the shortage of high-quality training data for robots. Unlike language models trained on vast amounts of public text, robots require data capturing physical interactions, which is scarce and hard to collect.
XDOF (pronounced “ecks-doff”) provides data pipelines, collection tools, and annotation systems that frontier AI labs and robotics companies struggle to build in-house. The company has raised $70 million from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo. Founded in October 2024, XDOF already counts 20 customers, including several leading AI labs it cannot name, and employs around 60 people.
Co-founder and CEO Philippe Wu encountered the data problem firsthand as a PhD student at UC Berkeley. He realized that before training foundation models for robotics, massive datasets had to be collected—a chicken-and-egg problem. With co-founders Fred Shentu and Nemo Jin, Wu launched XDOF to create a data ecosystem for robotics companies.
The startup is partnering with UC Berkeley’s AI Research lab to release the ABC dataset, which it claims is the largest collection of high-quality robot training data ever assembled. It includes 130,000 robot manipulation trajectories, 300 hours of simulation, and 100 hours of evaluations. The data has already been used to train robots on tasks like folding T-shirts, flattening boxes, and loading AirPods into their cases.
XDOF plans to operate across three tiers of a data pyramid. The most valuable tier is teleoperation data collected on the specific robot being deployed; the second tier uses teleoperated robots for more general data; and the third tier involves “egocentric” data gathered by humans performing everyday tasks, for which XDOF will develop its own wearable sensors. The company intends to hire and train teleoperators and data collectors worldwide, as the work requires large warehouses, robot maintenance, and operator training—tasks most AI labs prefer to outsource.
The name XDOF plays on the robotics term “degrees of freedom,” which refers to the number of independent motions a robot can perform. The “X” signifies unlimited degrees of freedom, reflecting the company’s ambition to support any type of robot with the necessary training data.


