OpenAI launches new macOS app for agentic coding
OpenAI has released a new macOS app designed for agentic coding, enabling developers to run autonomous coding workflows, manage tasks, and interact directly with AI-powered tools on Mac.
Artificial intelligence is already reshaping how software is built, with much of the repetitive, labour-intensive programming now handled by coordinated groups of agents and sub-agents. As developers explore new interfaces and form factors for collaborating with AI systems, even the most advanced AI research labs have struggled to keep pace with the rapid evolution of workflows.
One of the dominant trends in this space is agentic software development — systems in which AI agents can independently execute coding tasks. That approach has been popularised by tools such as Claude Code and Cowork. During this shift, OpenAI has been steadily expanding its Codex offering, which launched as a command-line tool last April and gained a web-based interface a month later.
Now, OpenAI is making a more decisive move. On Monday, the company introduced a new macOS application for Codex, bringing together many of the agentic development practices that have gained traction over the past year. The macOS app is built to support multiple agents working in parallel, combining agent capabilities with other modern workflows. The release also arrives less than two months after OpenAI unveiled GPT-5.2-Codex, its most advanced coding model to date — a release the company hopes will be compelling enough to draw users away from Claude Code.
“If you want to do truly sophisticated work on something complex, 5.2 is the strongest model available by a wide margin,” OpenAI CEO Sam Altman said during a press call. “That said, it hasn’t always been easy to use. By pairing that level of model capability with a more flexible interface, we think it can make a meaningful difference.”
Despite Altman’s confidence, performance benchmarks paint a more nuanced picture. GPT-5.2 currently sits at the top of TerminalBench, which measures how effectively AI models handle command-line programming tasks, at least at the time of publication. However, agents from Gemini 3 and Claude Opus have achieved scores that are broadly comparable — slightly lower, but still within the benchmark’s margin of error. Results from SWE-bench, another widely used benchmark that evaluates an AI’s ability to resolve real-world software bugs, show a similar pattern, with no clear leader emerging. Agentic workflows themselves remain difficult to benchmark reliably, and real-world user experience can vary widely even among state-of-the-art models.
The new Codex app also introduces a variety of features that OpenAI says will allow it to match — or in some cases exceed — competing Claude-based tools. One addition is support for automations that can run quietly in the background on a scheduled basis, with outputs saved to a queue for users to review later. The app also lets developers choose from different agent personalities, ranging from pragmatic to empathetic, depending on their preferred working style.
For OpenAI, however, the most compelling argument remains the speed at which AI-assisted development can move. “You can start from a completely blank slate and build a genuinely sophisticated piece of software in just a few hours,” Altman said. “At this point, the main constraint is how fast I can type new ideas — that’s the real bottleneck on what can be built.”
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0