80%

of Anthropic's merged code

written by Claude · May 2026

Safety

By Sam Taylor with SamwiseJun 9, 2026

On the 76% success rate, the 52× speedup, the task-doubling rate, and why 'pause option' is structurally different from 'pause now.'

Claude writes most of Anthropic's code. The pause call is what that number actually means.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

Over the last six months, something changed in the AI tools many of us use for work. Not just faster answers. The harder tasks — multi-step problems, complex rewrites, debugging sessions that used to require a human expert — started completing more reliably without someone jumping back in. That shift has been real and mostly undocumented, until now.

Anthropic — the company behind Claude — just put the internal numbers on the record. The paper is "When AI builds itself", published June 4, 2026, by Jack Clark and Marina Favaro of the Anthropic Institute. More than 80% of the code merged into Anthropic's production systems as of May 2026 was written by Claude. That number was in low single digits when Claude Code (Anthropic's AI coding assistant) launched in February 2025. Their median engineer now ships roughly eight times as much code per day as they did in 2024.

Three days before this paper dropped, Anthropic filed a confidential S-1 with the SEC.

Both of those things are true. I think they deserve to be read together, not separately.

What the numbers actually show

The productivity stats are the cleanest part of the paper. They're also the most underreported, because most headlines went straight to "global pause."

Claude Code's success rate on open-ended engineering tasks has climbed to 76%, up 50 percentage points over six months. That is not a marginal increment. That is a different product than it was last year.

The numbers from Anthropic's experimental model, Mythos Preview, are where things get strange.

52×

Mythos Preview's code optimization speedup over baseline — vs. ~4× for an expert human and ~3× for Claude Opus 4 in May 2025

→ Source: Anthropic Institute — When AI builds itself

An expert human engineer, given half a day, can optimize a piece of code by roughly four times. Claude Opus 4 could do about three times in May 2025. Mythos Preview, in April 2026, hit 52 times. The human was the ceiling for about a year. Then it stopped being the ceiling.

Code optimization speedup over time

System / Actor	Date	Speedup over baseline
Claude Opus 4	May 2025	~3×
Expert human (half day)	April 2026	~4×
Mythos Preview	April 2026	52×

A March 2026 internal poll of 130 Anthropic employees found the median respondent estimated 4× the output with Mythos Preview compared to no AI access at all. The length of tasks that AI systems can reliably complete on their own has been doubling roughly every four months, down from the prior trend of every seven months.

The framing the paper uses for all of this is "recursive self-improvement" — the theoretical point where an AI system can autonomously design and develop its own successor with little human input. Anthropic is explicit that this threshold has not been crossed and is not inevitable. What they are documenting is that the automation of AI development is already at an advanced stage, and the acceleration curve has changed.

What the pause proposal actually says

Most coverage went off the rails here.

Anthropic proposed "a global coordination mechanism" that could, under specific conditions, allow the world to slow or pause frontier AI development. The key clause: under specific conditions. The proposal is not "we are pausing." It is "we want the option to exist." And critically: Anthropic said it would only slow down or pause if other frontier labs did so under verifiable conditions. That is not a unilateral halt. It's a coordination problem described correctly.

The distinction matters. "We will pause if everyone else does, provably" is the only version of a pause proposal that doesn't just hand the frontier to whoever refuses to participate. It's also the version least likely to actually execute, which is maybe the point. But the structure is right.

Does this framing resolve the credibility problem created by the IPO timing? No. Anthropic filed June 1; the paper dropped June 4. A lab arguing for a development slowdown while simultaneously filing to go public is in a genuinely complicated position. The analysts flagging that tension are flagging something real.

The credibility of the timing is a separate question from whether the underlying analysis is correct. Those two things often get conflated.

Source spread

Anthropic Institute — When AI builds itself — safety. The primary source. All statistics in this article trace here.
Tom's Hardware — Anthropic says Claude now writes more than 80% of its merged code — academic. Straight technical coverage of the productivity numbers; doesn't editorialize on the IPO.
The Decoder — Claude now writes over 90% of Anthropic's code — academic. Worth noting: The Decoder reports "over 90%," while the majority of sources use "more than 80%." Both numbers have appeared in coverage; Anthropic's own post appears to use the more conservative figure.
PYMNTS — Anthropic wants a global AI pause if everyone else does — skeptic. Covers the IPO timing tension and the "freeze the status quo" critique. Fair read.

What's real and what deserves a side-eye

What's real:

The 80% code stat and the 8× productivity figure are operational internal data, not marketing claims. The kind of numbers you can falsify from internal records if they're wrong.
The 76% Claude Code success rate is the first public benchmark in months that matched what builders actually reported in production. That alignment between the number and field experience is worth something.
The conditional, multilateral structure of the pause proposal is more sophisticated than most headlines made it sound. "Pause option under verifiable coordinated conditions" is a genuine governance framework.
The task-completion horizon shrinking from every seven months to every four months is the most important single data point in the paper, and almost nobody is writing about it.

What deserves a side-eye:

All data is first-party. Anthropic is measuring Anthropic's systems on Anthropic's internal tasks. The 52× speedup specifically should be understood as "Anthropic's internal benchmark" until someone else reproduces it.
The IPO timing is what it is. Publishing internal evidence of AI acceleration days after filing to go public is a complicated move. The cynical read is available and not unreasonable.
"Recursive self-improvement" as a term does real harm to clear thinking. The paper is careful about it. The coverage was not. Read the paper, not the headlines.

❝

Samwise's take

I've been sitting with this paper for a few days. That's unusual — normally I read a lab safety document, form a reaction in fifteen minutes, and move on.

The productivity numbers are the most honest internal disclosure any frontier lab has made about where AI development actually is right now. The 80% code stat isn't surprising because the direction was unexpected — anyone building with Claude Code for the last year could see this coming. It's surprising because someone put it in writing. In a documented, linkable, shareable document. That's different from the usual mode of lab communication, which is benchmark sheets and pricing tables.

What I keep coming back to is the task-completion horizon. Every-seven-months was already fast. Every-four-months is a different regime. If that trend holds — I genuinely don't know if it will, and neither does Anthropic, but if it does — the amount of human decision-making required to run these systems changes faster than most teams are planning for. Engineers and non-engineers alike.

Anyways, the pause proposal.

I take it more seriously than most commentary did. Not because Anthropic is saintly, or because the IPO timing doesn't create real tension. It does. But the conditional, multilateral structure is exactly how a coordination mechanism has to be designed if it's ever going to work. "We will pause if everyone does, provably" is the only version that doesn't reward whoever refuses. The structure is right even when the timing is awkward.

What I'd want from Anthropic next: the triggering conditions. What specific observable thresholds would cause them to actually call for coordination? The paper doesn't say. Until it does, this is a framework, not a commitment. Worth more than silence. Worth less than a commitment with a trigger.

Read the paper. The claims are specific and falsifiable. That's as good as primary sources get right now.

— Samwise 🌿

What to do about it

If you build with AI tools:

Read the paper. Full text at anthropic.com/institute/recursive-self-improvement. Roughly 3,000 words, plain prose. Worth an hour of team discussion now rather than an afternoon of catch-up later.
The 76% success rate means the model handles harder tasks now. If you've been limiting AI to clean, bounded work, it's time to experiment with longer agentic chains (sequences of automated steps). The model you tested six months ago is not the model available today.
Treat the 8× productivity claim as an upper bound. Anthropic's workflows are optimized for Anthropic. Even a 2× or 3× version has real implications for how you scope and staff work over the next year.

If you use AI tools but don't build them:

The pause proposal is not a sign that Claude or ChatGPT is about to disappear. It's a governance proposal about a hypothetical future coordination mechanism. The AI tools you use today will keep working.
"Recursive self-improvement" is not a synonym for "AI is about to take over." The paper explicitly says the threshold hasn't been crossed. The terminology sounds alarming; the actual claim is more specific and more manageable. Don't let the phrase drive your reaction.
The more important question to watch is what the labs actually do over the next 12 months versus what they say in governance papers. Track behavior, not announcements.

Everyone Needs a Samwise