Anthropic drops safety pause pledge in Responsible Scaling Policy 3.0 amid AI race and Pentagon pressure in 2026

Anthropic Drops Safety Pause Pledge in AI Race

In a seismic shift that underscores the brutal tension between safety and speed in frontier AI, Anthropic announced on February 25, 2026, a major overhaul to its Responsible Scaling Policy (RSP), effectively dropping its flagship pledge to pause development and deployment of models if capabilities surge ahead of safety mitigations. The decision, detailed in RSP version 3.0 and reported exclusively by TIME, marks a turning point for the company long hailed as the most safety-conscious among leading labs. Anthropic’s leaders, including chief science officer Jared Kaplan, cited the relentless pace of AI progress and the impracticality of unilateral pauses in a hyper-competitive landscape where rivals like OpenAI, Google, and xAI show no signs of slowing.

This pivot arrives amid mounting pressures: just days earlier, on February 24, Defense Secretary Pete Hegseth issued a Friday ultimatum to CEO Dario Amodei, demanding unrestricted military access to Claude models or risking contract termination, supply-chain risk blacklisting, and even Defense Production Act invocation. The clash highlights how national security demands are colliding with ethical commitments, forcing Anthropic to recalibrate its stance. By abandoning the hard “pause” commitment—once a core promise never to train systems without guaranteed risk controls—the company signals a pragmatic embrace of the speed race, even as it vows to match or exceed competitors’ safety efforts and delay progress if it leads the pack and catastrophe risks loom large.

The backstory traces to Anthropic’s 2023 RSP, which committed to halting scaling at certain risk thresholds unless mitigations were demonstrably adequate. That policy positioned the lab as a bulwark against unchecked advancement, attracting talent and partners wary of existential threats. But as frontier models advanced jaggedly—excelling in math, coding, and agentic tasks while struggling with long-horizon reliability—the binary “pause or proceed” framework proved untenable. Kaplan told TIME the company concluded that stopping training unilaterally “wouldn’t actually help anyone,” especially with peers racing ahead.

Recent releases illustrate the agentic autonomy boom driving this shift. Anthropic’s Claude Opus 4.6 (February 5, 2026) and Sonnet 4.6 (February 17) introduced advanced features like coordinating teams of autonomous agents for parallel task execution, desktop control via Claude Code, and expanded working memory for handling complex, multi-step workflows. These capabilities enable longer unsupervised sessions, blurring lines between tools and agents. Sonnet 4.6 navigates web applications and forms at near-human levels, while both models support extended reasoning chains—hallmarks of the “agentic” era where AI acts independently over hours or days.

The International AI Safety Report 2026, released February 3 and led by Yoshua Bengio with input from over 100 experts across 30+ countries, underscores these jagged gains. It highlights rapid progress in specialized domains like coding and math, with frontier models approaching or exceeding expert human performance on benchmarks, though performance declines on longer, messier real-world tasks involving resource constraints or multi-agent coordination. Experts forecast continued specialization advances by 2028–2030, but disagree on timelines and scales. The report emphasizes emerging risks from these capabilities, including misuse in cyberattacks.

Cybersecurity threats amplify the stakes. The IBM X-Force Threat Intelligence Index (latest editions) warns of AI-accelerated attacks: generative models aiding phishing, malware generation, vulnerability discovery, and social engineering at unprecedented scale. Adversaries can now craft hyper-personalized campaigns or automate exploit chains, shrinking detection windows. As agentic systems gain autonomy, risks compound—hallucinations could lead to faulty decisions in high-stakes domains, while jailbreaks or misalignments enable harmful outputs.

Competitive pressures are unrelenting. OpenAI’s reduced-guardrail ChatGPT variants on genai.mil, Google’s Gemini integrations, and xAI’s Grok classified clearances exemplify compliance with Pentagon “all lawful use” demands. Anthropic’s ethical red lines—barring autonomous weapons and domestic surveillance—clash with DoD insistence on unrestricted leverage, risking its $200 million contract and classified access. xAI’s rapid entry erodes Anthropic’s former edge, pushing the lab to prioritize speed to retain relevance.

Enterprise dynamics add layers. Frontier alliances push customized, high-reliability models, while tools like Cursor battle for developer mindshare with agentic coding aids. Hardware bets intensify: Nvidia’s dominance fuels scaling, but investments in sovereign infrastructure and alternatives signal diversification amid geopolitical tensions.

Ethical debates rage. Critics argue dropping the pause pledge normalizes risk acceleration, potentially eroding trust in safety-focused labs. Supporters see it as realism: unilateral restraint cedes ground to less scrupulous actors, including state-backed efforts. The RSP v3.0 retains commitments to robust mitigations, but shifts from implied unilateral pauses to competitive matching—acknowledging that safety thrives in coordinated, not isolated, efforts.

Risks remain stark. Misuse scenarios—from bioweapons design to disinformation—grow with agentic power. Hallucinations in autonomous loops could cascade failures in defense, finance, or critical infrastructure. The Pentagon feud exposes governance gaps: private policies can’t bind sovereign needs, yet forcing compliance risks innovation chilling effects.

Looking to 2026’s remainder, the outlook is explosive yet uncertain. Governance lags progress, with no federal consensus on regulations. The International AI Safety Report calls for better risk management, but implementation varies. If Anthropic navigates pressures without fully abandoning principles, it could model balanced advancement. Failure risks a race-to-the-bottom dynamic, where speed trumps safeguards.

Ultimately, February 2026 crystallizes the AI wars’ core dilemma: frontier progress demands velocity, but unchecked velocity courts catastrophe. Anthropic’s pivot isn’t surrender—it’s adaptation in a world where pausing alone may mean falling behind, and leading without caution could prove fatal. As agentic systems proliferate and threats mount, 2026 will test whether humanity can govern explosive innovation before it outpaces our wisdom.

Ethan Brooks covers the tech that’s reshaping how we move, work, and think — for VFuture Media. He was at CES 2026 in Las Vegas when the world got its first real look at humanoid robots, AI-powered vehicles, and Samsung’s tri-fold phone. He writes about AI, EVs, gadgets, and green tech every week. No hype. No filler. X · Facebook

If you found this useful, the best thing you can do is share it with someone who’d actually appreciate it. And if you want more like it, we’re here every week.

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *