Claude Mythos: The Model Too Dangerous to Ship

Anthropic did something unusual on April 7th. They announced their most powerful model, showed the benchmarks, and then said you can't have it.
Claude Mythos Preview is not available through the API. You can't use it in Claude Pro. There's no waitlist. Anthropic built the thing, tested it extensively, and concluded that the cybersecurity risks outweigh the benefits of a public release. That's a first for a major AI lab, and I think it tells us something about where this is all heading.
What Mythos actually scores
The benchmarks are hard to ignore. 93.9% on SWE-bench Verified, which is 13 points above Opus 4.6 and roughly 15 points above GPT-5.4. 94.6% on GPQA Diamond (graduate-level science questions). 82% on Terminal-Bench 2.0. 97.6% on USAMO 2026, the USA Math Olympiad. Each of these is a double-digit lead over every publicly available model.
For context, here's where the current flagship models sit on SWE-bench Verified:
| Model | SWE-bench Verified | GPQA Diamond | Terminal-Bench 2.0 |
|---|---|---|---|
| Claude Mythos Preview | 93.9% | 94.6% | 82.0% |
| Claude Opus 4.6 | 80.8% | 91.3% | ~70% |
| GPT-5.4 | ~78% | ~88% | 75.1% |
| Gemini 3.1 Pro | ~79% | 94.3% | ~72% |
That SWE-bench gap is enormous. Going from 80% to 94% means Mythos can solve real-world GitHub issues that stump every other model. And it does it consistently.
The cybersecurity problem
Here's where it gets uncomfortable. During testing, Anthropic found that Mythos can identify and exploit zero-day vulnerabilities in every major operating system and every major web browser. Not theoretical vulnerabilities. Real ones. Some of them had been sitting in production code for over a decade.
The oldest one Anthropic reported finding was a 27-year-old bug in OpenBSD, an operating system that people specifically choose for its security track record. Mythos found it, understood why it was exploitable, and demonstrated a working exploit. That's not a party trick. That's a capability that changes the threat model for basically everyone running software.
Anthropic's risk report describes the model as "strikingly capable at computer security tasks." I've read enough corporate risk disclosures to know that when a company uses language like that about their own product, the internal discussions were probably a lot more alarming.
Project Glasswing
Rather than shelving Mythos entirely, Anthropic launched Project Glasswing, a consortium of 12 companies that get access to the model for defensive cybersecurity work only. The list includes Amazon, Apple, Cisco, Google, JPMorgan Chase, Microsoft, and Nvidia. Anthropic put up $100 million in usage credits to fund it.
The pitch is straightforward: if a model this capable exists, and if future models from other labs will eventually reach similar capabilities, then using it to find and patch vulnerabilities now is better than waiting for an attacker to get there first. That reasoning makes sense to me, even if the execution details are still murky.
What I find interesting is the access model. These aren't API keys. Glasswing partners run Mythos in controlled environments with audit trails, and the outputs are specifically scoped to vulnerability discovery and patch generation. Anthropic is betting that narrow, supervised access is safer than either broad release or no release at all.
The Claude model family timeline
To understand how big the Mythos jump is, it helps to see where each generation landed.
2024 — the three-tier bet pays off
March — Claude 3 (Haiku, Sonnet, Opus). The first three-tier lineup. Opus was the model where people started seriously using Claude for professional coding. Before this, Claude was the "other chatbot." After this, it had a real identity.
June — Claude 3.5 Sonnet. The surprise. Matched Opus-level performance at Sonnet pricing. This single release probably did more for Claude's reputation than anything else that year. If you were around for it, you remember the "wait, Sonnet is this good now?" moment.
October — Claude 3.5 Sonnet v2. Introduced Computer Use. First major AI model to directly control a computer interface. The demo was wild, the edges were rough, but it planted the flag.
2025 — the coding model year
February — Claude 3.7 Sonnet. Extended thinking arrived. The model could pause, reason step-by-step, then respond. Multi-step problems that used to require prompt chains became single-shot.
May — Claude 4 Sonnet. Professional-grade coding capabilities. This is when Claude Code became a daily driver for people doing serious development work. Not a novelty, not a demo. A tool you actually relied on.
September — Claude Sonnet 4.5. 77.2% on SWE-bench Verified, the highest score at the time. Crowned "the world's best coding model." Could maintain focus for 30+ hours on agentic tasks.
November — Claude Opus 4.5. A 67% price cut. Premium intelligence became affordable for everyday use instead of being reserved for special occasions.
2026 — context, agents, and then Mythos
February 5 — Claude Opus 4.6. 1M context window at standard pricing (no surcharge). Native agent teams and multi-agent collaboration. 80.8% SWE-bench.
February 17 — Claude Sonnet 4.6. 79.6% SWE-bench at $3/MTok input. The gap between Sonnet and Opus shrank to 1.2 points. For most workloads, Sonnet became hard to justify not using.
April 7 — Claude Mythos Preview. 93.9% SWE-bench. Not publicly available.
The pattern from 2024 through early 2026 was steady, predictable improvement. Each generation gained maybe 5-10% on the main benchmarks. Mythos broke that pattern. A 13-point jump in a single generation isn't incremental progress. Something changed in the training approach, and Anthropic hasn't explained what.
What this means if you're building with Claude
Practically, nothing changes for your day-to-day work. Opus 4.6 and Sonnet 4.6 are still the models you use. Sonnet at 79.6% on SWE-bench for $3/MTok input is still absurdly good value. Opus at 80.8% with 1M context handles the complex architecture work.
But Mythos is a signal. The gap between "publicly available" and "what the lab has in the building" is widening. That gap used to be a few months of training iteration. Now it's a qualitative capability jump that the lab itself considers too risky for general access.
I don't think Anthropic is the only lab sitting on models with capabilities they're nervous about. OpenAI and DeepMind have their own internal evaluation frameworks for deciding what ships and what doesn't. Mythos is just the first time a lab publicly acknowledged the gap and explained their reasoning.
The part I keep coming back to
Twenty-seven years. That OpenBSD bug sat in production for twenty-seven years and nobody found it. Millions of security researchers, countless audits, bug bounties, fuzzing tools, static analysis, formal verification efforts. A model found it.
That's the thing about Mythos that sticks with me more than the benchmarks. It isn't that the model is smart. It's that it's a different kind of thorough. It can hold an entire codebase in context, reason about subtle interaction patterns between components, and surface bugs that humans miss not because we're bad at security, but because the search space is too large for any individual or team to cover exhaustively.
Whether that capability ends up making us safer or less safe depends entirely on who controls it and how. Anthropic's answer, for now, is "we do, tightly." I'm not sure that scales, but I don't have a better answer either.


