China’s Zhipu AI Matches Anthropic’s Claude Mythos in Security Bug Detection

A new model from China’s Zhipu AI is reportedly matching or closely approaching the performance of Anthropic’s Claude Mythos — one of the world’s most advanced AI systems for finding security vulnerabilities.

According to benchmarks published by the cybersecurity company Semgrep and widely reported on June 28, 2026, Zhipu’s GLM-5.2 (an open-weight model) achieved strong results on security bug detection tasks, performing comparably to Mythos and outperforming earlier versions of Claude in specific scenarios.

This development is significant because Mythos was designed as a specialized frontier model for autonomous cybersecurity work — including discovering zero-days and chaining exploits. The fact that a Chinese lab has produced a competitive open-weight alternative highlights how quickly the global AI landscape is shifting, even as the United States imposes strict export controls on advanced chips and models.

Background: What Is Claude Mythos?

Claude Mythos Preview (launched by Anthropic in April 2026) was positioned as one of the most powerful AI systems ever built for offensive and defensive cybersecurity.

Key capabilities demonstrated by Mythos include:

Discovering thousands of zero-day vulnerabilities across major operating systems and browsers.
Autonomously chaining exploits to compromise simulated corporate networks (with notable success rates in UK AI Security Institute evaluations).
Multi-step reasoning to identify, exploit, and even patch security flaws with minimal human guidance.

Because of these capabilities, Anthropic chose not to release Mythos broadly. Access was reportedly limited to a small number of trusted partners (around 50 elite organizations) under strict controls. The model raised significant national security concerns due to its dual-use potential.

Zhipu AI’s GLM-5.2: The Chinese Challenger

Zhipu AI (also known as Z.ai) released GLM-5.2 in mid-June 2026. The model was initially rolled out to coding plan members and later made available with open weights.

Key highlights from independent testing:

In Semgrep’s benchmarks on IDOR (Insecure Direct Object Reference) detection — a common and critical web security vulnerability — GLM-5.2 achieved a 39% F1 score.
This outperformed Claude Code (Opus 4.6/4.8), which scored around 32–37% in similar prompt-only setups.
When given appropriate scaffolding or instructions, GLM-5.2 reportedly matches or closely approaches Claude Mythos performance on security bug detection tasks.
It delivered these results at a much lower cost — roughly $0.17 per vulnerability found.

Importantly, GLM-5.2 is an open-weight model, meaning researchers and organizations can download and run it locally or self-host it. This contrasts sharply with Anthropic’s decision to keep Mythos tightly restricted.

Benchmark Breakdown: How Close Is It Really?

GLM-5.2 (Zhipu)

Setup: Prompt only
IDOR Detection F1 Score: 39%
Notes: Open-weight model with a low-cost deployment.

Claude Code (Opus 4.6/4.8)

Setup: Prompt + SDK
IDOR Detection F1 Score: 32–37%
Notes: Strong performance, but behind GLM-5.2 in this benchmark.

Claude Mythos

Setup: With scaffolding
IDOR Detection F1 Score: Matches or very close to the top results
Notes: Specialized frontier model with restricted access.

Semgrep Multimodal Pipeline

Setup: Purpose-built harness
IDOR Detection F1 Score: 53–61%
Notes: Not a pure large language model comparison; uses a dedicated security testing pipeline

The results show that while specialized scaffolding and purpose-built tools still lead overall, a publicly available Chinese model is now competitive with (and in some narrow tests, ahead of) leading U.S. models on pure LLM-based vulnerability detection.

Researchers noted that GLM-5.2’s performance was particularly impressive given its cost and accessibility.

Why This Matters: Strategic and Geopolitical Implications

1. China Is Closing the Gap in Specialized Capabilities

Despite U.S. export controls on advanced semiconductors, Chinese labs continue to make rapid progress in targeted domains like coding and cybersecurity. Zhipu’s success in matching Mythos on bug detection shows that raw model scale isn’t the only path to high performance — clever training and architecture choices matter.

2. Open vs. Closed Model Strategies

Anthropic chose safety and control by restricting Mythos.
Zhipu chose openness and accessibility with GLM-5.2.

This philosophical difference is reshaping the global AI ecosystem. Open-weight models from China are gaining traction among developers who want powerful tools without relying on U.S. API providers or facing usage restrictions.

3. Cybersecurity Arms Race Intensifies

Both offensive and defensive cybersecurity capabilities are advancing quickly. If Chinese models can match top U.S. systems at finding vulnerabilities, this has implications for:

National critical infrastructure protection
Supply chain security
The balance of power in cyber operations

Security teams worldwide may soon have access to extremely capable (and cheap) AI tools for vulnerability hunting — regardless of which country developed them.

4. Questions About Export Controls

The development raises difficult questions about the effectiveness of current U.S. chip export restrictions. While they have slowed China’s progress in some areas, they have not prevented competitive advances in software and model capabilities.

What Zhipu’s Achievement Does Not Mean

It’s important to maintain perspective:

GLM-5.2 still lags behind the absolute top U.S. frontier models (including Mythos and leading OpenAI systems) in many general capabilities beyond narrow security tasks.
Matching performance on one benchmark (even an important one) does not equal overall parity.
Real-world deployment, safety alignment, and reliability at scale remain significant challenges for all labs.

However, the gap in this specific high-stakes domain has narrowed dramatically — and at a much lower price point.

Outlook: What Happens Next?

This development is likely to accelerate several trends:

Increased focus by U.S. labs on specialized security models and defensive tooling.
Greater scrutiny of open-weight model proliferation and potential misuse.
More investment by Chinese labs in cybersecurity and agentic AI capabilities.
Ongoing policy debates in Washington about how to maintain technological leadership while managing proliferation risks.

For cybersecurity professionals, the message is clear: AI-powered vulnerability detection is no longer the exclusive domain of a handful of U.S. frontier labs. Powerful tools are becoming more accessible — and potentially more democratized — than many expected.

Frequently Asked Questions

What is GLM-5.2? GLM-5.2 is the latest major model from China’s Zhipu AI. It was released in mid-June 2026 with open weights and has shown strong performance on coding and security tasks.

How does it compare to Claude Mythos? On security bug detection benchmarks (particularly IDOR vulnerabilities), GLM-5.2 matches or closely approaches Mythos performance when properly prompted, and outperforms earlier Claude versions in some tests. It remains behind in broader general capabilities.

Is the model available to everyone? Yes. Unlike Mythos (which is heavily restricted), GLM-5.2’s weights are open, allowing researchers and organizations to download and run it.

Why is this significant? It shows China achieving competitive performance in a critical national security-relevant domain (cybersecurity) despite U.S. technology restrictions — and doing so with an open model at lower cost.

Does this mean China has caught up overall? Not yet in most general benchmarks. However, in specialized areas like automated vulnerability detection, the gap has narrowed significantly.

Bottom Line Zhipu AI’s GLM-5.2 has demonstrated that Chinese models can now match Anthropic’s specialized Claude Mythos on security bug detection tasks. This is a notable milestone in the global AI competition and underscores how quickly capabilities are diffusing — even as the U.S. attempts to maintain a lead through export controls and restricted access.

For cybersecurity teams, developers, and policymakers, this is another signal that the AI-powered security landscape is evolving rapidly and becoming more multipolar.

Tags: Zhipu AI, GLM-5.2, Claude Mythos, AI cybersecurity, vulnerability detection, China AI, Anthropic, open-weight models, AI race

CTA:

What do you think about Chinese labs matching U.S. models in critical security capabilities? Should open-weight models like GLM-5.2 be more widely embraced or more tightly regulated? Share your thoughts in the comments below.

China’s Zhipu AI Matches Anthropic’s Claude Mythos in Security Bug Detection

Background: What Is Claude Mythos?

Zhipu AI’s GLM-5.2: The Chinese Challenger