OpenAI says AI browsers may always be vulnerable to prompt injection attacks

OpenAI says prompt-injection attacks targeting AI browsers may never be fully solved. The company outlines how it is strengthening ChatGPT Atlas with layered defences, reinforcement-learning-based attackers, and rapid-response testing to reduce risks associated with agentic AI browsing.

Dec 22, 2025 - 18:24
 8
OpenAI says AI browsers may always be vulnerable to prompt injection attacks
Image Credits: OpenAI

Even as OpenAI works to strengthen the defences of its Atlas AI browser against cyberattacks, the company acknowledges that prompt-injection attacks are unlikely to disappear anytime soon. Prompt injection is a class of attack that manipulates AI agents into following malicious instructions, often hidden within web pages, documents, or emails. The persistence of this threat raises broader concerns about the safety of AI agents operating on the open web.

“Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved,’” OpenAI wrote in a blog post published Monday, outlining the steps it is taking to reinforce Atlas against constant attack attempts. The company also conceded that enabling “agent mode” in ChatGPT Atlas significantly “expands the security threat surface.”

OpenAI launched its ChatGPT Atlas browser in October, and security researchers quickly demonstrated weaknesses. Some studies have shown that embedding just a few carefully crafted instructions in a Google Docs file can alter the underlying browser’s behaviour. On the same day, Brave published a blog post arguing that indirect prompt injection represents a structural challenge for all AI-powered browsers, including Perplexity Comet.

OpenAI is not alone in this assessment. Earlier this month, the National Cyber Security Centre in the U.K. warned that prompt injection attacks targeting generative AI systems “may never be totally mitigated,” cautioning that such vulnerabilities could expose websites and users to data breaches. Rather than assuming these attacks can be eliminated, the agency advised security professionals to focus on reducing their likelihood and limiting their impact.

For its part, OpenAI said: “We view prompt injection as a long-term AI security challenge, and we’ll need to continuously strengthen our defenses against it.”

To address this ongoing risk, OpenAI says it has adopted a proactive, rapid-response security cycle. The company claims this approach is already showing early promise by helping internal teams uncover new attack strategies before they are exploited in real-world scenarios.

This philosophy aligns with what competitors such as Anthropic and Google have emphasised: layered defences and continuous stress testing are essential to counter persistent prompt-based threats. Google, for example, has recently highlighted architectural and policy-level safeguards designed specifically for agentic AI systems.

Where OpenAI diverges is in its use of an “LLM-based automated attacker.” This system is a bot trained via reinforcement learning to behave like a hacker, actively searching for ways to inject malicious instructions into an AI agent.

The automated attacker can test exploits in a simulated environment before deploying them, observing how the target AI reasons and what actions it might take. It then refines its tactics and repeats the process. Because this system has insight into the target AI’s internal reasoning — access that external attackers lack — OpenAI believes it can identify vulnerabilities faster than adversaries operating in the wild.

This method reflects a broader trend in AI safety research: building agents specifically designed to uncover edge cases and failure modes through rapid, large-scale simulation.

“Our reinforcement learning–trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens or even hundreds of steps,” OpenAI wrote. “We also observed novel attack strategies that did not appear in our human red teaming campaign or external reports.”

Image Credits:OpenAI

In one demonstration, OpenAI showed how the automated attacker inserted a malicious email into a user’s inbox. When the AI agent later reviewed the inbox, it followed hidden instructions in the message and sent a resignation email instead of drafting an out-of-office reply. According to OpenAI, after the latest security update, agent mode successfully detected the injection attempt and alerted the user.

While acknowledging that prompt injection cannot be fully secured against, OpenAI says it is relying on large-scale testing and faster patch cycles to harden Atlas before new techniques emerge in real-world attacks. The company declined to share whether recent updates have measurably reduced successful prompt injections, but said it has collaborated with third parties to strengthen Atlas since before launch.

Wiz principal security researcher Rami McCarthy says reinforcement learning can help systems adapt to attacker behaviour, but it is only one piece of the puzzle.

“A useful way to reason about risk in AI systems is autonomy multiplied by access,” McCarthy told TechCrunch.

“Agentic browsers tend to sit in a challenging part of that space: moderate autonomy combined with very high access,” he said. “Many current recommendations reflect that trade-off. Limiting logged-in access primarily reduces exposure, while requiring review or confirmation requests constrains autonomy.”

Those principles align with OpenAI’s own user recommendations. Atlas is designed to request confirmation before sending messages or making payments, and OpenAI advises users to provide narrow, specific instructions rather than broad access with vague directives such as telling an agent to “take whatever action is needed.”

“Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place,” OpenAI said.

Despite OpenAI’s focus on mitigating prompt injection risks, McCarthy urges caution when weighing the benefits of agentic browsers against their current security profile.

“For most everyday use cases, agentic browsers don’t yet deliver enough value to justify their current risk profile,” he told TechCrunch. “The risk is high given their access to sensitive data like email and payment information, even though that access is also what makes them powerful. That balance will evolve, but today the trade-offs are still very real.”

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0
TechAmerica.ai Staff TechAmerica.ai’s editorial team, consisting of expert editors, writers, and researchers, crafts accurate, clear, and valuable content focused on technology and education. We deliver in-depth technology news and analysis, with a special emphasis on founders and startup teams, covering funding trends, innovative startups, and entrepreneurial insights to empower our readers.