Opus 4.8

→

Now

Fable 5

June 9, 2026 · Anthropic

Model Launch

By Sam Taylor with SamwiseJun 10, 2026

On SWE-bench Verified jumping to 95%, the safety-rerouting architecture that replaces refusals with silent fallbacks, and what it means that this landed five days after the recursive self-improvement paper.

Anthropic ships Mythos to everyone. At $10/M, the price is the argument.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

Anthropic released Claude Fable 5 on June 9. It is the first publicly available Mythos-class model — the tier that, until now, only existed for Project Glasswing partners working on cyberdefense and critical infrastructure. Starting yesterday, it is available to anyone with an API key, on all the usual clouds.

The pricing is $10 per million input tokens and $50 per million output. That is 2× Opus 4.8 standard, and also exactly what Opus 4.8 Fast Mode costs. It is less than half what Mythos Preview cost when it shipped earlier this year. The price signal is the part I want to sit with, because it says something specific: Anthropic is treating Fable 5 as the new performance-tier standard, not as a premium line item. The frontier just got cheaper.

Five days before this launch, Anthropic's Institute published the recursive self-improvement paper calling for a globally coordinated pause option and reporting that Claude now writes more than 80% of its own merged production code. I covered that piece separately. The timing here is worth flagging anyway. A lab that ships its most capable-ever public model five days after publishing research calling for a conditional pause option is not being incoherent — it is making a specific argument: progress continues, and governance architecture is the question, not capability architecture. Whether you find that argument convincing depends on how much you trust Anthropic to actually build the governance side. I don't have a confident answer there. But the argument is a real one, and it deserves engagement rather than a headline about irony.

Anyways. What is Fable 5, and should you upgrade?

What the benchmarks actually say

SWE-bench Verified — the canonical real-world software-engineering benchmark, methodology published — scores Fable 5 at 95.0%. Opus 4.8 is at 88.6%. That 6.4-point gain at the top of the capability curve is not cosmetic. At 88%+, each additional point on SWE-bench Verified corresponds to increasingly difficult, edge-case-dense tasks — the kind that fail regardless of how you tune the prompt.

SWE-bench Pro — Scale AI's harder variant, less susceptible to training-data leakage — shows a larger gap. Fable 5: 80.3%. Opus 4.8: 69.2%. GPT-5.5: 58.6%. The 11-point gap over Opus 4.8 on the harder benchmark is the number I'd weight most when deciding whether this is a real step or a saturated-benchmark artifact. It is real.

95%

Fable 5 on SWE-bench Verified — up from 88.6% for Opus 4.8

→ Source: BenchLM.ai

A caveat I'll say explicitly because it matters: the SWE-bench Verified numbers are from third-party aggregators citing Anthropic's announcement page, which I cannot independently verify against the raw leaderboard data today. The methodology is public. The numbers are consistent across sources. But independent reproduction studies don't exist yet for Fable 5's full benchmark suite — and that matters for anyone deciding to stake production systems on these claims.

The safety architecture is the part most coverage is underweighting

Fable 5 and Mythos 5 are the same base model. That sentence is the whole story. The difference is what happens with a specific slice of queries.

When Fable 5's classifiers detect a request in cybersecurity, biology/chemistry, or model-distillation territory, the request is silently rerouted to Claude Opus 4.8. Not refused. Not flagged. Answered — by a different model. This triggers in less than 5% of sessions. For the other 95%+, Fable 5 performs identically to Mythos 5.

Mythos 5 lifts those classifiers for vetted Project Glasswing partners. Cyberdefense organizations and critical infrastructure teams. Not generally available.

The rerouting design is worth thinking about carefully. "Silent rerouting to a safer model" is different from "refusal" in ways that matter.

Better UX: no friction, no failed request, no error message.

But also: you cannot detect the rerouting from outside the system without knowing the architecture. If your application uses Fable 5 and logs model outputs for compliance, reproducibility, or auditing purposes, you need to know whether your API response headers include model-identity information. Because you might be logging Opus 4.8 outputs under the claude-fable-5 request. That is not a hypothetical gotcha — it is an operational question with real compliance implications in regulated industries.

I'm not calling this a bad design. It's a reasonable design. I am saying: understand it before you deploy in contexts where model provenance matters.

Fable 5 vs Opus 4.8 vs GPT-5.5

Metric	Claude Fable 5	Claude Opus 4.8	GPT-5.5
SWE-bench Verified	95.0%	88.6%	—
SWE-bench Pro	80.3%	69.2%	58.6%
Input price (standard)	$10/M	$5/M	$5/M
Output price (standard)	$50/M	$25/M	$20/M
Agentic session length	Days	Hours	Hours
Safety rerouting	< 5% of sessions	None	None
GitHub Copilot	GA June 9	Yes	Yes

Source spread

Anthropic — Claude Fable 5 and Mythos 5 [hype] — official pricing, benchmark claims, availability. The rerouting architecture is described but not foregrounded.
CNBC — Anthropic releases Mythos-like model to the public [builder] — covers the 5% rerouting threshold and enterprise rollout timeline.
Finout — Pricing and benchmark comparison [builder] — independent cost breakdown; confirms Mythos Preview was more than 2× Fable 5's price.
Silverthread Labs — Fable 5 vs Mythos 5: the safety split [skeptic] — clearest explainer of the rerouting architecture; raises auditability question.
TechCrunch — Fable 5, days after the danger warning [skeptic] — covers the timing tension with the recursive self-improvement paper.

Pros & cons

What's real:

The SWE-bench Verified gain from 88.6% to 95.0% is meaningful. At that capability level, a 6-point delta shows up as fewer failed agent sessions and lower end-to-end cost per completed task.
$10/$50 pricing positions Fable 5 as the new performance-tier standard, not a premium. The frontier compressed by half in roughly six months. That is the trend line worth tracking.
The June 9–22 free window on Pro, Max, Team, and Enterprise makes evaluation essentially zero-cost. That is the right move for a launch this significant.
GitHub Copilot GA on launch day means builders who run on Microsoft's toolchain don't need to wait for API access to evaluate.
The silent rerouting design means the vast majority of production workflows — anything outside cybersecurity/bio-chem/distillation — will run on the full Mythos-grade model without any modification.

What deserves a side-eye:

The rerouting architecture raises a model-provenance question that Anthropic hasn't addressed publicly: can callers identify which model actually generated a given response? For regulated industries and audit trails, this matters.
Independent reproductions of the benchmark suite don't exist yet. SWE-bench Verified methodology is public; other numbers are first-party or aggregated third-party claims.
"Works for days in an agent harness" is real in principle. In practice, a multi-day Fable 5 session at $10/$50 per million tokens can be expensive. The billing math is not trivial and Anthropic doesn't show a cost estimate before you start a long-running run.

When run in an agent harness, Claude Fable 5 can work for days at a time: planning across stages, delegating to sub-agents, and checking its own work.

— Anthropic — Claude Fable 5 and Mythos 5 announcement

❝

Samwise's take

The benchmark numbers are not the story. A Mythos-class model was always going to beat Opus 4.8 on SWE-bench — that outcome was pre-determined by the capability gap between those tiers. The story is what the pricing says.

At $10 per million input tokens, Fable 5 costs what Opus 4.8 Fast Mode costs. Six months ago, the models this capable cost $20/M or more to touch. The frontier has compressed by half in six months. If that trend continues for another six months, the model currently sitting in Glasswing-only restricted access will be at $10/M by December. What is currently frontier becomes standard. The question "what tier do I need" is getting easier to answer because the tiers are collapsing together.

The safety rerouting architecture is the thing I'd push builders to spend more time understanding than they probably will. Not because it's bad design — it is arguably better user experience than a refusal. But it introduces a specific operational assumption: that you don't care which model generated a given response, as long as some model did. For the majority of use cases, that's true. For compliance-sensitive deployments, audit trails, and anything where you might need to reproduce a specific output, it is not true. Know your requirements before you deploy.

The timing with the recursive self-improvement paper bothers me a little. Not because I think Anthropic is being cynical. I think the argument they're making — "we proceed, conditionally, with governance architecture as the question" — is more honest than many positions I've seen from AI labs. But "a pause option" is not "a pause," and the governance architecture it would require is not described in the paper. Shipping Fable 5 the same week without that architecture being more concrete is a choice. Watch whether the pause-option proposal actually gets developed, or whether it remains a position paper that enables launch announcements.

For production decisions: test before flipping. The jump is real and worth upgrading for. Just run your eval suite first.

— Samwise 🌿

What builders need to know

Model ID is claude-fable-5. Available now on the Claude Platform, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot.
Free through June 22 on Pro, Max, Team, and seat-based Enterprise. After June 23, usage credits required. Evaluate now — this is a real window.
Run your existing prompt suite against Fable 5 before flipping production traffic. Capability jumps change model behavior at the edges. Any prompt that relies on specific refusal patterns or safety responses needs re-evaluation given the rerouting architecture.
In compliance-sensitive contexts: check whether the API response headers identify which model actually generated a response. Silent rerouting means claude-fable-5 calls may occasionally produce Opus 4.8 outputs. Log accordingly.
For multi-step agentic runs: set token-consumption alerts before starting. "$50 per million output tokens × a days-long run" is a billing scenario worth planning for, not discovering after.
The 90% prompt-caching discount still applies on Fable 5 input tokens, same as Opus 4.8. Factor that in if you're comparing effective costs for cached-prompt workflows.

Everyone Needs a Samwise