Decart Unveils AI World Model Capable of Simulating Hours of Realistic Driving
Decart has introduced a new AI-powered world model that can generate hours of photorealistic driving simulations. Explore how the technology works, its capabilities, limitations, and potential impact on autonomous driving, AI training, and virtual environments.
AI startup Decart on Wednesday introduced Oasis 3, its newest interactive world model that can generate photorealistic driving environments in real time. The company has made the model available through an API from launch.
Initially, Decart is focusing on autonomous vehicle developers that require large-scale simulation of rare and difficult driving situations. The company also plans to extend the technology into robotics and other forms of physical AI. However, its broader ambition centres on developers. By providing API access immediately, Decart hopes to cultivate a developer ecosystem around world models, much as OpenAI helped popularise ecosystems around language models.
“It’s going to be the first usable world model that people can actually program on top of,” said Dean Leitersdorf, Decart’s co-founder and CEO. “I think there’s going to be an entire developer community that emerges on top of this.”
The company already claims a developer community of over 100,000 users, many of whom are building products based on Lucy, Decart’s real-time video model. Those applications have largely focused on e-commerce and livestreaming. Oasis 3 builds upon the same foundation technology and represents Decart’s move into physical AI. The company said access costs $0.02 per second, while enterprise pricing varies by use case.
Decart is entering an increasingly competitive world-model market. Over the past year, Google released Genie 3 as a research preview, Fei-Fei Li’s World Labs introduced Marble for commercial applications, and video-generation firms such as Luma and Runway have been adapting their physics-aware video systems into world models.
The launch arrives only weeks after Decart secured $300 million in funding. According to Leitersdorf, the round was driven by “huge demand increases for the models we built” across e-commerce, livestreaming, and physical AI. The investment pushed the company’s valuation to nearly $4 billion and attracted strategic backers including Toyota, Adobe, and eBay. Leitersdorf noted that each of those organisations could also become customers. Existing investor Nvidia also participated in the financing round.
According to the company, Oasis 3 distinguishes itself through its photorealistic visuals and its ability to generate environments continuously. Decart attributes that advantage to its efficiency-focused engineering and to its other major product, the DOS (Decart Optimisation Stack). The software is designed to run AI models efficiently across Nvidia, Amazon, and Google hardware, reducing operational costs compared with competing systems.
“This is built on top of our entire real-time stack, which we optimise all the way down to the hardware,” Leitersdorf said. “By being so vertically integrated, we’re able to be more than an order of magnitude cheaper than anyone else in the industry to run these models.”
Leitersdorf also stated that Decart has consumed “drastically less” than $100 million over its existence due to the efficiency of its systems.
Oasis 3 can generate physically accurate, multi-camera environments that include one forward-facing view and two side-facing views. These environments are intended for training and testing AI systems. Rather than limiting users to short demonstrations or research previews, Decart allows developers to create and explore scenarios indefinitely, which could be particularly useful for autonomous vehicle companies seeking to evaluate a large number of edge cases.
Compared with other world models I have tested, including Google’s Genie 3 and World Labs’ Marble, Oasis 3 produced some of the most photorealistic environments from a single text prompt. The ability to interact with those environments for extended periods also suggests a level of efficiency that some competing systems may not yet match.
However, the model’s quality declines as those generated worlds continue for long periods.
During testing, the system consistently generated compelling scenes that closely matched the initial prompts. Over time, though, thematic consistency weakened. For example, when prompted to create a New York City street in the morning, the model initially generated a convincing version of the setting. Yet after driving through the area for a while, the location gradually lost its distinct New York identity. It began to resemble a generic urban area that could belong to almost any Western city.
Attempts to return to the original starting point revealed additional issues. The initial intersection had disappeared entirely and had been replaced by a completely different environment. The driving controls also felt sluggish and sometimes unresponsive, making it difficult to steer the vehicle consistently. Similar shortcomings have appeared in other world models I have tested. The overall experience often felt less like navigating a stable simulation and more like moving through a dream-like sequence that became increasingly disconnected from its original context.
Another recurring problem involved vehicle interactions. Cars could pass through other cars without collisions, indicating that the environment does not yet model physics accurately. Leitersdorf described this as a “major research problem that we’re cracking now,” explaining that available training data contains significantly more examples of normal driving than crashes or accidents.
Part of the challenge stems from the system’s underlying architecture. Oasis 3 is an autoregressive model, meaning it generates one frame at a time and uses previously generated frames to predict the next frame. This approach is common among world models, but it also demands substantial computing resources.
To improve consistency, Leitersdorf said Decart is actively researching methods for extending the model’s memory capabilities.
“Every frame we generate is roughly 8,000 tokens,” he said. “Generating this at tens of frames per second — that’s hundreds of thousands of tokens per second. The context window fills up very quickly. We’re researching how to do longer context to store millions more tokens, and how to compress the memory into fewer tokens.”
Leitersdorf believes some of the consistency challenges could be addressed in a future version of the model. The upcoming update is expected to allow users to generate worlds from environmental videos rather than static images. He acknowledged that the broader field of world models remains in its early stages.
Despite the technology’s current limitations, the CEO remains focused on the possibilities that could emerge once developers begin experimenting with the platform.
“It takes me back to the early days of LLMs, when OpenAI invented the API for models,” he said, pointing to how developer communities helped uncover new applications and accelerate innovation.
“When we talk again in three months, we’ll be like, ‘Here are 100 developers who all built 100 different applications with Oasis that surprised all of us,’” he said.
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0