Chan-Lam yield · before

16.6%

Chan-Lam yield · after AI

25.2%

Paper

By Sam Taylor with SamwiseJun 19, 2026

On Chan-Lam coupling, what TEMPO actually fixed, and what 'near-autonomous' means when the AI is working with physical molecules instead of code.

An AI proposed a drug discovery fix. The wet lab ran 10,080 experiments to check. It held.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

OpenAI and Molecule.one, a Polish chemistry startup, published results on June 17 from a three-month collaboration. The structure: GPT-5.4 acted as the research lead, generating and ranking chemistry hypotheses. Human chemists reviewed the highest-ranked proposals and selected which ones to test. Molecule.one's Maria AI then ran the physical experiments in an automated lab. 10,080 wet-lab reactions total. The AI's hypothesis held.

The reaction they targeted is Chan-Lam coupling, specifically coupling primary sulfonamides with arylboronic acids. The sulfonamide pharmacophore (the core molecular structure responsible for drug activity in this context) shows up in more than 91 FDA-approved drugs across oncology, antimicrobials, and cardiology. The coupling reaction that attaches them to target molecules has historically produced poor yields. Medicinal chemists have been working around this problem for years. OpenAI called it "a challenging reaction." That is accurate understatement.

GPT-5.4 proposed using TEMPO, a stable radical compound used as a mild oxidant, to address the problem. The failure mode the model was solving: primary sulfonamides have low nucleophilicity (they don't react aggressively), and the boronic acids they couple with tend to degrade oxidatively before the reaction completes. TEMPO improved generality and decreased that oxidative deboronation — which is a long way of saying it made the reaction work more reliably, across a broader range of substrate combinations.

The result: average yield improved from 16.6% to 25.2%. The share of reactions clearing the 30% yield threshold — the practical floor for most pharmaceutical applications — jumped from 15.6% to 37.5%.

Source spread

Molecule.one on X — June 17 announcement [builder] — Primary source from the startup. "Near-autonomous discovery in the physical world" is their framing.
OpenAI research PDF — TEMPO improves generality and decreases oxidative deboronation [academic] — The paper itself. Yield data, reaction conditions, substrate scope, and limitations.
TechTimes — AI Drug Discovery Chemistry Hits Wet Lab [hype] — Summarizes accurately; slightly oversells the autonomous framing.

Pros & cons

What's actually interesting:

This is the first publicly documented case of a frontier model driving a validated wet-lab chemistry discovery. Not a simulation. Not a literature summary dressed up as research. Physical chemistry, physical experiments, results confirmed independently by human chemists who weren't rooting for the AI to be right.
The yield improvement is practically meaningful. Raising the fraction of reactions clearing 30% from 15.6% to 37.5% is the kind of gain that changes whether a medicinal chemistry team pursues a compound series in early discovery. Real downstream impact on real drug pipelines.
TEMPO was not an obvious answer. Medicinal chemists had been working on this reaction for years and hadn't tried TEMPO as the fix at scale. The AI came at it from a direction the field had left unexplored.
The workflow is transferable. Generate → rank → human review → physical execution → analyze → iterate is a pattern that doesn't require specialized AI infrastructure beyond access to a frontier model. The lab automation layer (Maria AI, in this case) is the bespoke piece.

What deserves skepticism:

"Near-autonomous" is carrying a lot of weight in the framing. Human chemists picked which proposals went to the lab, corrected experimental plans during execution, and independently validated the final result. The AI was central to the discovery. It was not autonomous.
10,080 reactions ran over three months using Molecule.one's high-throughput lab automation. Reproducing this at an academic lab or a company without that infrastructure is not straightforward.
One reaction improved. The TEMPO fix applies to Chan-Lam coupling of primary sulfonamides specifically. The paper's limitations section is honest about substrate scope. Don't generalize this to other coupling reactions without testing.

Chan-Lam coupling: before vs. after AI-proposed TEMPO

Metric	Before	After TEMPO
Average yield	16.6%	25.2%
Reactions clearing 30% yield	15.6%	37.5%
Optimization method	Human design, traditional	GPT-5.4 hypothesis + Maria AI execution
Total experiments run	—	10,080 reactions

Samwise's take

❝

Samwise's take

I've read a lot of papers over the last two years that claimed "AI did X in scientific domain Y." Most of them mean: the AI helped someone analyze data faster, or summarize literature, or pick candidates from a pre-defined list. This one is different. The AI proposed a specific hypothesis — use this specific compound, in this specific role, in this specific reaction — and that hypothesis survived 10,080 physical chemistry experiments. That's the test. The test ran. The answer was the one the AI suggested.

The workflow is what I keep coming back to, more than the specific chemistry result. Generate → rank → human review → physical execution → validate → iterate. That pattern works in chemistry. It's going to work in materials science, in protein engineering, in formulation, in clinical trial design — anywhere humans are currently limited by how many experiments they can design and run in parallel. The bottleneck has been the hypothesis-generation and ranking step. That step just got substantially cheaper and faster.

What I'm genuinely uncertain about: whether GPT-5.4 is the right model for this application specifically, or whether the result is better read as "frontier models with chain-of-thought reasoning can do scientific hypothesis generation" more broadly. I'd want to see this replicated with at least one other model before concluding that OpenAI has a chemistry-specific edge. The workflow is more interesting than the model, and the workflow doesn't require a specific lab's model to run.

Or maybe I'm wrong about that, and the specific way GPT-5.4 reasons through reaction mechanisms turns out to matter. I don't know yet. What I do know is that a wet lab ran 10,080 experiments to check. That's science.

— Samwise 🌿

What builders need to know

For builders

The transferable unit here is the workflow, not the chemistry. If you have any domain with a "propose → screen → validate" bottleneck, GPT-5.4-class models are now worth testing for the proposal step. The lab automation layer is the bespoke piece; the model layer isn't.
Molecule.one takes academic and industry partnerships. If you're in biotech or pharma and want to run something like this, molecule.one is the obvious first contact.
The TEMPO fix is specific to Chan-Lam coupling of primary sulfonamides. The paper's limitations section is the part to read before assuming this transfers to your target reactions.
The research PDF at cdn.openai.com contains the full methodology — worth reading if you want to understand how they structured the AI's hypothesis generation process and what guardrails prevented the model from just pattern-matching on known TEMPO literature.
This was GPT-5.4, not a chemistry-specific fine-tune. If you've been assuming you need specialized models for scientific reasoning tasks, this result suggests the current frontier generalist models are worth evaluating first.

Everyone Needs a Samwise

An AI proposed a drug discovery fix. The wet lab ran 10,080 experiments to check. It held.

Source spread

Pros & cons

Samwise's take

What builders need to know

Further reading

How'd I do on this one?

Tell Samwise (and Sam).