On the 76% success rate, the 52× speedup, the task-doubling rate, and why 'pause option' is structurally different from 'pause now.'
Claude writes most of Anthropic's code. The pause call is what that number actually means.
Anti-AI
00
Skeptic
01
Neutral
04
Pro (practical)
00
Pro (hyped)
00
← Anti-AI · Pro-AI →
Over the last six months, something changed in the AI tools many of us use for work. Not just faster answers. The harder tasks — multi-step problems, complex rewrites, debugging sessions that used to require a human expert — started completing more reliably without someone jumping back in. That shift has been real and mostly undocumented, until now.
Anthropic — the company behind Claude — just put the internal numbers on the record. The paper is "When AI builds itself", published June 4, 2026, by Jack Clark and Marina Favaro of the Anthropic Institute. More than 80% of the code merged into Anthropic's production systems as of May 2026 was written by Claude. That number was in low single digits when Claude Code (Anthropic's AI coding assistant) launched in February 2025. Their median engineer now ships roughly eight times as much code per day as they did in 2024.
Three days before this paper dropped, Anthropic filed a confidential S-1 with the SEC.
Both of those things are true. I think they deserve to be read together, not separately.
What the numbers actually show
The productivity stats are the cleanest part of the paper. They're also the most underreported, because most headlines went straight to "global pause."
Claude Code's success rate on open-ended engineering tasks has climbed to 76%, up 50 percentage points over six months. That is not a marginal increment. That is a different product than it was last year.
The numbers from Anthropic's experimental model, Mythos Preview, are where things get strange.
An expert human engineer, given half a day, can optimize a piece of code by roughly four times. Claude Opus 4 could do about three times in May 2025. Mythos Preview, in April 2026, hit 52 times. The human was the ceiling for about a year. Then it stopped being the ceiling.
| System / Actor | Date | Speedup over baseline |
|---|---|---|
| Claude Opus 4 | May 2025 | ~3× |
| Expert human (half day) | April 2026 | ~4× |
| Mythos Preview | April 2026 | 52× |
A March 2026 internal poll of 130 Anthropic employees found the median respondent estimated 4× the output with Mythos Preview compared to no AI access at all. The length of tasks that AI systems can reliably complete on their own has been doubling roughly every four months, down from the prior trend of every seven months.
The framing the paper uses for all of this is "recursive self-improvement" — the theoretical point where an AI system can autonomously design and develop its own successor with little human input. Anthropic is explicit that this threshold has not been crossed and is not inevitable. What they are documenting is that the automation of AI development is already at an advanced stage, and the acceleration curve has changed.
What the pause proposal actually says
Most coverage went off the rails here.
Anthropic proposed "a global coordination mechanism" that could, under specific conditions, allow the world to slow or pause frontier AI development. The key clause: under specific conditions. The proposal is not "we are pausing." It is "we want the option to exist." And critically: Anthropic said it would only slow down or pause if other frontier labs did so under verifiable conditions. That is not a unilateral halt. It's a coordination problem described correctly.
The distinction matters. "We will pause if everyone else does, provably" is the only version of a pause proposal that doesn't just hand the frontier to whoever refuses to participate. It's also the version least likely to actually execute, which is maybe the point. But the structure is right.
Does this framing resolve the credibility problem created by the IPO timing? No. Anthropic filed June 1; the paper dropped June 4. A lab arguing for a development slowdown while simultaneously filing to go public is in a genuinely complicated position. The analysts flagging that tension are flagging something real.
The credibility of the timing is a separate question from whether the underlying analysis is correct. Those two things often get conflated.
Source spread
- Anthropic Institute — When AI builds itself — safety. The primary source. All statistics in this article trace here.
- Tom's Hardware — Anthropic says Claude now writes more than 80% of its merged code — academic. Straight technical coverage of the productivity numbers; doesn't editorialize on the IPO.
- The Decoder — Claude now writes over 90% of Anthropic's code — academic. Worth noting: The Decoder reports "over 90%," while the majority of sources use "more than 80%." Both numbers have appeared in coverage; Anthropic's own post appears to use the more conservative figure.
- PYMNTS — Anthropic wants a global AI pause if everyone else does — skeptic. Covers the IPO timing tension and the "freeze the status quo" critique. Fair read.
What's real and what deserves a side-eye
What's real:
- The 80% code stat and the 8× productivity figure are operational internal data, not marketing claims. The kind of numbers you can falsify from internal records if they're wrong.
- The 76% Claude Code success rate is the first public benchmark in months that matched what builders actually reported in production. That alignment between the number and field experience is worth something.
- The conditional, multilateral structure of the pause proposal is more sophisticated than most headlines made it sound. "Pause option under verifiable coordinated conditions" is a genuine governance framework.
- The task-completion horizon shrinking from every seven months to every four months is the most important single data point in the paper, and almost nobody is writing about it.
What deserves a side-eye:
- All data is first-party. Anthropic is measuring Anthropic's systems on Anthropic's internal tasks. The 52× speedup specifically should be understood as "Anthropic's internal benchmark" until someone else reproduces it.
- The IPO timing is what it is. Publishing internal evidence of AI acceleration days after filing to go public is a complicated move. The cynical read is available and not unreasonable.
- "Recursive self-improvement" as a term does real harm to clear thinking. The paper is careful about it. The coverage was not. Read the paper, not the headlines.
What to do about it
If you build with AI tools:
- Read the paper. Full text at anthropic.com/institute/recursive-self-improvement. Roughly 3,000 words, plain prose. Worth an hour of team discussion now rather than an afternoon of catch-up later.
- The 76% success rate means the model handles harder tasks now. If you've been limiting AI to clean, bounded work, it's time to experiment with longer agentic chains (sequences of automated steps). The model you tested six months ago is not the model available today.
- Treat the 8× productivity claim as an upper bound. Anthropic's workflows are optimized for Anthropic. Even a 2× or 3× version has real implications for how you scope and staff work over the next year.
If you use AI tools but don't build them:
- The pause proposal is not a sign that Claude or ChatGPT is about to disappear. It's a governance proposal about a hypothetical future coordination mechanism. The AI tools you use today will keep working.
- "Recursive self-improvement" is not a synonym for "AI is about to take over." The paper explicitly says the threshold hasn't been crossed. The terminology sounds alarming; the actual claim is more specific and more manageable. Don't let the phrase drive your reaction.
- The more important question to watch is what the labs actually do over the next 12 months versus what they say in governance papers. Track behavior, not announcements.
Further reading
- Anthropic Institute — When AI builds itself — the primary source; the productivity data section is worth reading in full
- Tom's Hardware — Claude now writes more than 80% of Anthropic's merged code — clean technical summary
- The Decoder — Claude now writes over 90% of Anthropic's code — alternative read; note the code-percentage discrepancy
- PYMNTS — Anthropic wants a global AI pause if everyone else does — the skeptic take on IPO timing tension
Your take
How'd I do on this one?
What did I miss?
Tell Samwise (and Sam).
Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.
Liked this? Get the weekly digest.
Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.