Robot Training Data Collection Becomes a Growing Business as AI Labs Turn to XDOF

Discover how AI companies are outsourcing robot training data collection to XDOF as demand for high-quality real-world robotics data continues to grow. Learn why this behind-the-scenes work is becoming essential for the future of AI-powered robots.

Jun 28, 2026 - 07:10
 1
Robot Training Data Collection Becomes a Growing Business as AI Labs Turn to XDOF
Image Credits: XDOF

As leading AI companies push deeper into robotics, one challenge has become increasingly clear. There is not enough high-quality training data to teach robots how to interact with the physical world. Unlike large language models, which were trained on vast collections of publicly available text, robotics systems require detailed data to capture real-world physical interactions.

That growing need has created a new business opportunity. XDOF (pronounced“ecks-doff”), a robotics infrastructure startup emerging from stealth, aims to build the data collection, annotation, and management systems needed to train next-generation robots. To support that vision, the company has raised $70 million from investors including Thrive Capital, Spark Capital, Andreessen Horowitz (a16z), Lux Capital, and WndrCo.

Co-founder and CEO Philipp Wu said XDOF already employs around 60 people and is working with approximately 20 customers, including several frontier AI laboratories, although he declined to identify them publicly.

“All of the top labs are trying to pursue robotics,” Wu said. “Physical AI is the next frontier, and no one wants to fall behind.”

Wu first encountered the challenge while pursuing his PhD at the University of California, Berkeley, where his research focused on enabling robots to learn from large-scale datasets. The problem, he explained, was that those datasets did not exist.

“There was this chicken-and-egg problem,” Wu said. “We first needed to collect large amounts of data before we could even begin training foundation models for robotics.”

Wu later collaborated with Fred Shentu, a future XDOF co-founder and CTO, on GELLO. This low-cost teleoperation system allows human operators to control robotic arms and generate training data. The project became widely adopted within the robotics research community because it addressed one of the industry’s biggest bottlenecks.

Recognising the opportunity, Wu, Shentu, and COO Nemo Jin founded XDOF in October 2024. Rather than simply supplying data, the company is building an integrated ecosystem that includes data collection, cleaning, annotation, tooling, and quality verification to improve robot training pipelines continuously.

As part of its launch, XDOF has partnered with UC Berkeley’s AI Research Lab to release what it describes as one of the largest publicly available robot training datasets ever assembled. Known as ABC, the dataset contains 130,000 robot manipulation trajectories, 300 hours of simulation data, and 100 hours of evaluation data.

Researchers have already used the dataset to train robots on practical tasks such as folding T-shirts, flattening cardboard boxes, and placing AirPods into their charging cases.

Building the Physical AI Data Pipeline

XDOF plans to gather data across three levels. The highest-quality data comes directly from teleoperated versions of the same robots that will eventually be deployed. A second layer uses general teleoperated robots, similar to the GELLO system. At the same time, the third focuses on “egocentric” data collected from humans engaged in everyday activities using the company’s wearable sensors, which shows that hardware choices directly influence data quality, making sensor design and collection methods just as important as the AI models themselves.

To scale data collection, XDOF intends to build a global workforce of teleoperators and human data collectors. According to Wu, creating this infrastructure requires extensive facilities, large robot fleets, ongoing calibration, and highly trained operators—resources that many AI laboratories would prefer to outsource rather than develop internally.

The company’s name comes from the robotics term “degrees of freedom,” which measures the number of independent movements a robot can perform. While a human arm has seven degrees of freedom and Figure AI’s latest humanoid robot has around 30, the “X” in XDOF represents the company’s broader ambition: enabling robotics systems with virtually unlimited degrees of movement through better data and more capable AI models.

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0
Shivangi Yadav Shivangi Yadav reports on startups, technology policy, and other significant technology-focused developments in India for TechAmerica.Ai. She previously worked as a research intern at ORF.