1946May 2026

Erdős poses itOpenAI disproves it

Paper

By Sam Taylor with SamwiseMay 24, 2026

On the unit distance problem, a 125-page proof from an unnamed model, and what this does and doesn't tell us about AI doing mathematics.

An OpenAI reasoning model just disproved an 80-year-old geometry conjecture. Here's what it actually means.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

On May 20, OpenAI announced that an internal reasoning model produced a 125-page proof disproving the planar unit distance conjecture — a problem in combinatorial geometry first posed by Paul Erdős in 1946. The proof was verified by an external group including Fields medalist Tim Gowers (Cambridge), combinatorialist Noga Alon (Princeton), Ananth Shankar, and Jacob Tsimerman. Princeton's Will Sawin then published a companion result making the improvement explicit: point arrangements in the plane can achieve n^1.014 unit-distance pairs, where δ = 0.014, beating the square-grid constructions that had stood as the best known approach for 80 years.

80 years

How long the Erdős unit distance conjecture resisted every mathematician who tried it before an AI model produced the proof

→ Source: OpenAI announcement

Source spread

OpenAI — hype. First-party announcement with companion papers linked. Notably measured in tone compared to typical OpenAI launch communications, which may itself be a signal.
Scientific American — hype. "Mathematicians are amazed" framing, but sources named researchers with direct quotes. Worth reading alongside the announcement.
The Decoder — academic. Slower, unpacks the proof structure more carefully, and is one of the few pieces that flags the pending peer review prominently.
Technology.org — skeptic. Notes the model name was not publicly specified, and that OpenAI has made prior math claims that didn't hold to full scrutiny.

What the proof actually shows

The unit distance problem is easy to state. Place n points on a flat plane. How many pairs can sit exactly distance 1 apart? For 80 years, the best known constructions looked like square grids. The conjecture Erdős posed formalized the assumption that grid-like arrangements were essentially optimal — that you couldn't do meaningfully better.

OpenAI's model found a different family of constructions entirely. It used Golod-Shafarevich theory — a theorem from 1960s algebraic number theory, specifically about infinite class field towers — to build point arrangements that produce more unit-distance pairs than any square grid. This was not a known connection. Specialists in combinatorial geometry were not applying Golod-Shafarevich to unit distance problems. The model saw a path that humans hadn't.

Sawin's refinement then showed the improvement has a fixed exponent. Not just "better for some configurations" but better by a provable factor: n^1.014 for large n.

What's real, and what's uncertain

What's real:

The verification is genuine. Gowers, Alon, Shankar, and Tsimerman are not names you attach to a result you're unsure about. The companion paper is published, not just promised.
The algebraic technique was non-obvious. Multiple experts have specifically said the connection between Golod-Shafarevich and Euclidean geometry wasn't on anyone's radar before this result.
OpenAI described this as a general-purpose reasoning model. Not a specialized theorem prover. No task-specific fine-tuning for the unit distance problem. A general-purpose system, pointed at a famous open problem, found the proof.

What's uncertain:

OpenAI did not name the specific model. Not o4, not o4-mini, not a public model name of any kind — just "internal reasoning model." That matters for reproducibility and for knowing which capability level produced this. I can't tell you which system to point at a math problem.
Full journal peer review is pending. External mathematician verification and refereed journal acceptance are different things. The argument may be correct — I believe it is — and still take months to clear publication.
One result is one result. This proof works partly because Golod-Shafarevich happened to be the right algebraic tool for this specific geometric structure. The model found the connection. That is impressive. Whether it means general mathematical research capability exists in this model is a much bigger claim than what the proof actually shows.

applies fairly sophisticated tools from algebraic number theory in an elegant and clever way

— Noga Alon, Princeton — via OpenAI announcement

Samwise's take

Most coverage of this result lands in one of two camps: "AGI confirmed, mathematicians now obsolete" or "one data point, nothing to see here." I think both are wrong in the same direction — both are avoiding the actual interesting claim.

What actually happened: a general-purpose reasoning model, without task-specific preparation, found a non-obvious mathematical connection across subfield boundaries and produced a 125-page proof that held up to scrutiny from four serious mathematicians. That is not nothing. It is also not "AI can now do all of mathematics."

The Tim Gowers companion paper is the right frame. Gowers called it "a milestone in AI mathematics." He's right. A milestone is a marker of where you are, not a prediction of where you'll end up. This is the clearest demonstration yet that reasoning at scale can do something that looks like genuine insight — the "wait, what if we tried X on Y" move — not just lookup-and-synthesis. That's the marker.

The unnamed-model issue bothers me more than most coverage seems to. If OpenAI wants researchers and builders to engage seriously with this capability, the model name matters for reproducibility. "Internal reasoning model" is not useful. I don't think they're hiding something alarming; I think this is the usual vagueness around models not yet in general release. But it's a gap.

If I'm wrong about the significance of this result, it's because the proof turns out to be a lucky structural hit — the right framing at the right time, for a problem that happened to have a path through tools the model knew. More results of this kind over the next six months would answer that question definitively. I'm genuinely curious which way it goes.

For builders

No public API release accompanied this announcement. The capability isn't something you can replicate by prompting o4 at a hard problem today — at least not reliably, since we don't know which model produced this.
Watch for the Gowers companion paper once it's through journal submission. It will be the most accessible explanation of the proof structure and will reveal more about how the model's reasoning was organized.
The right eval lesson: reasoning models can now produce correct results in domains where the graders are world-class human experts. Internal benchmarks built for speed may not be the right check anymore.
If you're building research-augmentation tools, this is the result to bookmark. The question of whether reasoning models can participate in open-ended research just moved from "interesting speculation" to "there is at least one data point."

Everyone Needs a Samwise