OpenAI on Monday released a new desktop application for its Codex artificial intelligence coding system, a tool the company says transforms software development from a collaborative exercise with a single AI assistant into something more akin to managing a team of autonomous workers.
The Codex app for macOS functions as what OpenAI executives describe as a “command center for agents,” allowing developers to delegate multiple coding tasks simultaneously, automate repetitive work, and supervise AI systems that can run for up to 30 minutes independently before returning completed code.
“This is the most loved internal product we’ve ever had,” Sam Altman, OpenAI’s chief executive, told VentureBeat in a press briefing ahead of Monday’s launch. “It’s been totally an amazing thing for us to be using recently at OpenAI.”
The release arrives at a pivotal moment for the enterprise AI market. According to a survey of 100 Global 2000 companies published last week by venture capital firm Andreessen Horowitz, 78% of enterprise CIOs now use OpenAI models in production, though competitors Anthropic and Google are gaining ground rapidly. Anthropic posted the largest share increase of any frontier lab since May 2025, growing 25% in enterprise penetration, with 44% of enterprises now using Anthropic in production.
The timing of OpenAI’s Codex app launch — with its focus on professional software engineering workflows — appears designed to defend the company’s position in what has become the most contested segment of the AI market: coding tools.
Why developers are abandoning their IDEs for AI agent management
The Codex app introduces a fundamentally different approach to AI-assisted coding. While previous tools like GitHub Copilot focused on autocompleting lines of code in real-time, the new application enables developers to “effortlessly manage multiple agents at once, run work in parallel, and collaborate with agents over long-running tasks.”
Alexander Embiricos, the product lead for Codex, explained the evolution during the press briefing by tracing the product’s lineage back to 2021, when OpenAI first introduced a model called Codex that powered GitHub Copilot.
“Back then, people were using AI to write small chunks of code in their IDEs,” Embiricos said. “GPT-5 in August last year was a big jump, and then 5.2 in December was another massive jump, where people started doing longer and longer tasks, asking models to do work end to end. So what we saw is that developers, instead of working closely with the model, pair coding, they started delegating entire features.”
The shift has been so profound that Altman said he recently completed a substantial coding project without ever opening a traditional integrated development environment.
“I was astonished by this…I did this fairly big project in a few days earlier this week and over the weekend. I did not open an IDE during the process. Not a single time,” Altman said. “I did look at some code, but I was not doing it the old-fashioned way, and I did not think that was going to be happening by now.”
How skills and automations extend AI coding beyond simple code generation
The Codex app introduces several new capabilities designed to extend AI coding beyond writing lines of code. Chief among these are “Skills,” which bundle instructions, resources, and scripts so that Codex can “reliably connect to tools, run workflows, and complete tasks according to your team’s preferences.”
The app includes a dedicated interface for creating and managing skills, and users can explicitly invoke specific skills or allow the system to automatically select them based on the task at hand. OpenAI has published a library of skills for common workflows, including tools to fetch design context from Figma, manage projects in Linear, deploy web applications to cloud hosts like Cloudflare and Vercel, generate images using GPT Image, and create professional documents in PDF, spreadsheet, and Word formats.
To demonstrate the system’s capabilities, OpenAI asked Codex to build a racing game from a single prompt. Using an image generation skill and a web game development skill, Codex built the game by working independently using more than 7 million tokens with just one initial user prompt, taking on “the roles of designer, game developer, and QA tester to validate its work by actually playing the game.”
The company has also introduced “Automations,” which allow developers to schedule Codex to work in the background on an automatic schedule. “When an Automation finishes, the results land in a review queue so you can jump back in and continue working if needed.”
Thibault Sottiaux, who leads the Codex team at OpenAI, described how the company uses these automations internally: “We’ve been using Automations to handle the repetitive but important tasks, like daily issue triage, finding and summarizing CI failures, generating daily release briefs, checking for bugs, and more.”
The app also includes built-in support for “worktrees,” allowing multiple agents to work on the same repository without conflicts. “Each agent works on an isolated copy of your code, allowing you to explore different paths without needing to track how they impact your codebase.”
OpenAI battles Anthropic and Google for control of enterprise AI spending
The launch comes as enterprise spending on AI coding tools accelerates dramatically. According to the Andreessen Horowitz survey, average enterprise AI spend on large language models has risen from approximately $4.5 million to $7 million over the last two years, with enterprises expecting growth of another 65% this year to approximately $11.6 million.
Leadership in the enterprise AI market varies significantly by use case. OpenAI dominates “early, horizontal use cases like general purpose chatbots, enterprise knowledge management and customer support,” while Anthropic leads in “software development and data analysis, where CIOs consistently cite rapid capability gains since the second half of 2024.”
When asked during the press briefing how Codex differentiates from Anthropic’s Claude Code, which has been described as having its “ChatGPT moment,” Sottiaux emphasized OpenAI’s focus on model capability for long-running tasks.
“One of the things that our models are extremely good at—they really sit at the frontier of intelligence and doing reliable work for long periods of time,” Sottiaux said. “This is also what we’re optimizing this new surface to be very good at, so that you can start many parallel agents and coordinate them over long periods of time and not get lost.”
Altman added that while many tools can handle “vibe coding front ends,” OpenAI’s 5.2 model remains “the strongest model by far” for sophisticated work on complex systems.
“Taking that level of model capability and putting it in an interface where you can do what Thibault was saying, we think is going to matter quite a bit,” Altman said. “That’s probably the, at least listening to users and sort of looking at the chatter on social that’s that’s the single biggest differentiator.”
The surprising satisfies on AI progress: how fast humans can type
The philosophical underpinning of the Codex app reflects a view that OpenAI executives have been articulating for months: that human limitations — not AI capabilities — now constitute the primary constraint on productivity.
In a December appearance on Lenny’s Podcast, Embiricos described human typing speed as “the current underappreciated limiting factor” to achieving artificial general intelligence. The logic: if AI can perform complex coding tasks but humans can’t write prompts or review outputs fast enough, progress stalls.
The Codex app attempts to address this by enabling what the team calls an “abundance mindset” — running multiple tasks in parallel rather than perfecting single requests. During the briefing, Embiricos described how power users at OpenAI work with the tool.
“Last night, I was working on the app, and I was making a few changes, and all of these changes are able to run in parallel together. And I was just sort of going between them, managing them,” Embiricos said. “Behind the scenes, all these tasks are running on something called gate work trees, which means that the agents are running independently, and you don’t have to manage them.”
In the Sequoia Capital podcast “Training Data,” Embiricos elaborated on this mindset shift: “The mindset that works really well for Codex is, like, kind of like this abundance mindset and, like, hey, let’s try anything. Let’s try anything even multiple times and see what works.” He noted that when users run 20 or more tasks in a day or an hour, “they’ve probably understood basically how to use the tool.”
Building trust through sandboxes: how OpenAI secures autonomous coding agents
OpenAI has built security measures into the Codex architecture from the ground up. The app uses “native, open-source and configurable system-level sandboxing,” and by default, “Codex agents are limited to editing files in the folder or branch where they’re working and using cached web search, then asking for permission to run commands that require elevated permissions like network access.”
Embiricos elaborated on the security approach during the briefing, noting that OpenAI has open-sourced its sandbox technology.
“Codex has this sandbox that we’re actually incredibly proud of, and it’s open source, so you can go check it out,” Embiricos said. The sandbox “basically ensures that when the agent is working on your computer, it can only make writes in a specific folder that you want it to make rights into, and it doesn’t access network without information.”
The system also includes a granular permission model that allows users to configure persistent approvals for specific actions, avoiding the need to repeatedly authorize routine operations. “If the agent wants to do something and you find yourself annoyed that you’re constantly having to approve it, instead of just saying, ‘All right, you can do everything,’ you can just say, ‘Hey, remember this one thing — I’m actually okay with you doing this going forward,’” Embiricos explained.
Altman emphasized that the permission architecture signals a broader philosophy about AI safety in agentic systems.
“I think this is going to be really important. I mean, it’s been so clear to us using this, how much you want it to have control of your computer, and how much you need it,” Altman said. “And the way the team built Codex such that you can sensibly limit what’s happening and also pick the level of control you’re comfortable with is important.”
He also acknowledged the dual-use nature of the technology. “We do expect to get to our internal cybersecurity high moment of our models very soon. We’ve been preparing for this. We’ve talked about our mitigation plan,” Altman said. “A real thing for the world to contend with is going to be defending against a lot of capable cybersecurity threats using these models very quickly.”
The same capabilities that make Codex valuable for fixing bugs and refactoring code could, in the wrong hands, be used to discover vulnerabilities or write malicious software—a tension that will only intensify as AI coding agents become more capable.
From Android apps to research breakthroughs: how Codex transformed OpenAI’s own operations
Perhaps the most compelling evidence for Codex’s capabilities comes from OpenAI’s own use of the tool. Sottiaux described how the system has accelerated internal development.
“A Sora Android app is an example of that where four engineers shipped in only 18 days internally, and then within the month we give access to the world,” Sottiaux said. “I had never noticed such speed at this scale before.”
Beyond product development, Sottiaux described how Codex has become integral to OpenAI’s research operations.
“Codex is really involved in all parts of the research — making new data sets, investigating its own screening runs,” he said. “When I sit in meetings with researchers, they all send Codex off to do an investigation while we’re having a chat, and then it will come back with useful information, and we’re able to debug much faster.”
The tool has also begun contributing to its own development. “Codex also is starting to build itself,” Sottiaux noted. “There’s no screen within the Codex engineering team that doesn’t have Codex running on multiple, six, eight, ten, tasks at a time.”
When asked whether this constitutes evidence of “recursive self-improvement” — a concept that has long concerned AI safety researchers — Sottiaux was measured in his response.
“There is a human in the loop at all times,” he said. “I wouldn’t necessarily call it recursive self-improvement, a glimpse into the future there.”
Altman offered a more expansive view of the research implications.
“There’s two parts of what people talk about when they talk about automating research to a degree where you can imagine that happening,” Altman said. “One is, can you write software, extremely complex infrastructure, software to run training jobs across hundreds of thousands of GPUs and babysit them. And the second is, can you come up with the new scientific ideas that make algorithms more efficient.”
He noted that OpenAI is “seeing early but promising signs on both of those.”
The end of technical debt? AI agents take on the work engineers hate most
One of the more unexpected applications of Codex has been addressing technical debt — the accumulated maintenance burden that plagues most software projects.
Altman described how AI coding agents excel at the unglamorous work that human engineers typically avoid.
“The kind of work that human engineers hate to do — go refactor this, clean up this code base, rewrite this, write this test — this is where the model doesn’t care. The model will do anything, whether it’s fun or not,” Altman said.
He reported that some infrastructure teams at OpenAI that “had sort of like, given up hope that you were ever really going to long term win the war against tech debt, are now like, we’re going to win this, because the model is going to constantly be working behind us, making sure we have great test coverage, making sure that we refactor when we’re supposed to.”
The observation speaks to a broader theme that emerged repeatedly during the briefing: AI coding agents don’t experience the motivational fluctuations that affect human programmers. As Altman noted, a team member recently observed that “the hardest mental adjustment to make about working with these sort of like aI coding teammates, unlike a human, is the models just don’t run out of dopamine. They keep trying. They don’t run out of motivation. They don’t get, you know, they don’t lose energy when something’s not working. They just keep going and, you know, they figure out how to get it done.”
What the Codex app costs and who can use it starting today
The Codex app launches today on macOS and is available to anyone with a ChatGPT Plus, Pro, Business, Enterprise, or Edu subscription. Usage is included in ChatGPT subscriptions, with the option to purchase additional credits if needed.
In a promotional push, OpenAI is temporarily making Codex available to ChatGPT Free and Go users “to help more people try agentic workflows.” The company is also doubling rate limits for existing Codex users across all paid plans during this promotional period.
The pricing strategy reflects OpenAI’s determination to establish Codex as the default tool for AI-assisted development before competitors can gain further traction. More than a million developers have used Codex in the past month, and usage has nearly doubled since the launch of GPT-5.2-Codex in mid-December, building on more than 20x usage growth since August 2025.
Customers using Codex include large enterprises like Cisco, Ramp, Virgin Atlantic, Vanta, Duolingo, and Gap, as well as startups like Harvey, Sierra, and Wonderful. Individual developers have also embraced the tool: Peter Steinberger, creator of OpenClaw, built the project entirely with Codex and reports that since fully switching to the tool, his productivity has roughly doubled across more than 82,000 GitHub contributions.
OpenAI’s ambitious roadmap: Windows support, cloud triggers, and continuous background agents
OpenAI outlined an aggressive development roadmap for Codex. The company plans to make the app available on Windows, continue pushing “the frontier of model capabilities,” and roll out faster inference.
Within the app, OpenAI will “keep refining multi-agent workflows based on real-world feedback” and is “building out Automations with support for cloud-based triggers, so Codex can run continuously in the background—not just when your computer is open.”
The company also announced a new “plan mode” feature that allows Codex to read through complex changes in read-only mode, then discuss with the user before executing. “This means that it lets you build a lot of confidence before, again, sending it to do a lot of work by itself, independently, in parallel to you,” Embiricos explained.
Additionally, OpenAI is introducing customizable personalities for Codex. “The default personality for Codex has been quite terse. A lot of people love it, but some people want something more engaging,” Embiricos said. Users can access the new personalities using the /personality command.
Altman also hinted at future integration with ChatGPT’s broader ecosystem.
“There will be all kinds of cool things we can do over time to connect people’s ChatGPT accounts and leverage sort of all the history they’ve built up there,” Altman said.
Microsoft still dominates enterprise AI, but the window for disruption is open
The Codex launch occurs as most enterprises have moved beyond single-vendor strategies. According to the Andreessen Horowitz survey, “81% now use three or more model families in testing or production, up from 68% less than a year ago.”
Despite the proliferation of AI coding tools, Microsoft continues to dominate enterprise adoption through its existing relationships. “Microsoft 365 Copilot leads enterprise chat though ChatGPT has closed the gap meaningfully,” and “Github Copilot is still the coding leader for enterprises.” The survey found that “65% of enterprises noted they preferred to go with incumbent solutions when available,” citing trust, integration, and procurement simplicity.
However, the survey also suggests significant opportunity for challengers: “Enterprises consistently say they value faster innovation, deeper AI focus, and greater flexibility paired with cutting edge capabilities that AI native startups bring.”
OpenAI appears to be positioning Codex as a bridge between these worlds. “Codex is built on a simple premise: everything is controlled by code,” the company stated. “The better an agent is at reasoning about and producing code, the more capable it becomes across all forms of technical and knowledge work.”
The company’s ambition extends beyond coding. “We’ve focused on making Codex the best coding agent, which has also laid the foundation for it to become a strong agent for a broad range of knowledge work tasks that extend beyond writing code.”
When asked whether AI coding tools could eventually move beyond early adopters to become mainstream, Altman suggested the transition may be closer than many expect.
“Can it go from vibe coding to serious software engineering? That’s what this is about,” Altman said. “I think we are over the bar on that. I think this will be the way that most serious coders do their job — and very rapidly from now.”
He then pivoted to an even bolder prediction: that code itself could become the universal interface for all computer-based work.
“Code is a universal language to get computers to do what you want. And it’s gotten so good that I think, very quickly, we can go not just from vibe coding silly apps but to doing all the non-coding knowledge work,” Altman said.
At the close of the briefing, Altman urged journalists to try the product themselves: “Please try the app. There’s no way to get this across just by talking about it. It’s a crazy amount of power.”
For developers who have spent careers learning to write code, the message was clear: the future belongs to those who learn to manage the machines that write it for them.
michael.nunez@venturebeat.com (Michael Nuñez)
Source link