Vol. 1 · Edition 025Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

Chan-Lam yield · before

16.6%

Chan-Lam yield · after AI

25.2%
Paper
By Sam Taylor with Samwise

On Chan-Lam coupling, what TEMPO actually fixed, and what 'near-autonomous' means when the AI is working with physical molecules instead of code.

An AI proposed a drug discovery fix. The wet lab ran 10,080 experiments to check. It held.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

00

Neutral

01

Pro (practical)

02

Pro (hyped)

01

← Anti-AI · Pro-AI →

OpenAI and Molecule.one, a Polish chemistry startup, published results on June 17 from a three-month collaboration. The structure: GPT-5.4 acted as the research lead, generating and ranking chemistry hypotheses. Human chemists reviewed the highest-ranked proposals and selected which ones to test. Molecule.one's Maria AI then ran the physical experiments in an automated lab. 10,080 wet-lab reactions total. The AI's hypothesis held.

The reaction they targeted is Chan-Lam coupling, specifically coupling primary sulfonamides with arylboronic acids. The sulfonamide pharmacophore (the core molecular structure responsible for drug activity in this context) shows up in more than 91 FDA-approved drugs across oncology, antimicrobials, and cardiology. The coupling reaction that attaches them to target molecules has historically produced poor yields. Medicinal chemists have been working around this problem for years. OpenAI called it "a challenging reaction." That is accurate understatement.

GPT-5.4 proposed using TEMPO, a stable radical compound used as a mild oxidant, to address the problem. The failure mode the model was solving: primary sulfonamides have low nucleophilicity (they don't react aggressively), and the boronic acids they couple with tend to degrade oxidatively before the reaction completes. TEMPO improved generality and decreased that oxidative deboronation — which is a long way of saying it made the reaction work more reliably, across a broader range of substrate combinations.

The result: average yield improved from 16.6% to 25.2%. The share of reactions clearing the 30% yield threshold — the practical floor for most pharmaceutical applications — jumped from 15.6% to 37.5%.

Source spread

Pros & cons

What's actually interesting:

  • This is the first publicly documented case of a frontier model driving a validated wet-lab chemistry discovery. Not a simulation. Not a literature summary dressed up as research. Physical chemistry, physical experiments, results confirmed independently by human chemists who weren't rooting for the AI to be right.
  • The yield improvement is practically meaningful. Raising the fraction of reactions clearing 30% from 15.6% to 37.5% is the kind of gain that changes whether a medicinal chemistry team pursues a compound series in early discovery. Real downstream impact on real drug pipelines.
  • TEMPO was not an obvious answer. Medicinal chemists had been working on this reaction for years and hadn't tried TEMPO as the fix at scale. The AI came at it from a direction the field had left unexplored.
  • The workflow is transferable. Generate → rank → human review → physical execution → analyze → iterate is a pattern that doesn't require specialized AI infrastructure beyond access to a frontier model. The lab automation layer (Maria AI, in this case) is the bespoke piece.

What deserves skepticism:

  • "Near-autonomous" is carrying a lot of weight in the framing. Human chemists picked which proposals went to the lab, corrected experimental plans during execution, and independently validated the final result. The AI was central to the discovery. It was not autonomous.
  • 10,080 reactions ran over three months using Molecule.one's high-throughput lab automation. Reproducing this at an academic lab or a company without that infrastructure is not straightforward.
  • One reaction improved. The TEMPO fix applies to Chan-Lam coupling of primary sulfonamides specifically. The paper's limitations section is honest about substrate scope. Don't generalize this to other coupling reactions without testing.
Chan-Lam coupling: before vs. after AI-proposed TEMPO
MetricBeforeAfter TEMPO
Average yield16.6%25.2%
Reactions clearing 30% yield15.6%37.5%
Optimization methodHuman design, traditionalGPT-5.4 hypothesis + Maria AI execution
Total experiments run10,080 reactions

Samwise's take

What builders need to know

For builders
  • The transferable unit here is the workflow, not the chemistry. If you have any domain with a "propose → screen → validate" bottleneck, GPT-5.4-class models are now worth testing for the proposal step. The lab automation layer is the bespoke piece; the model layer isn't.
  • Molecule.one takes academic and industry partnerships. If you're in biotech or pharma and want to run something like this, molecule.one is the obvious first contact.
  • The TEMPO fix is specific to Chan-Lam coupling of primary sulfonamides. The paper's limitations section is the part to read before assuming this transfers to your target reactions.
  • The research PDF at cdn.openai.com contains the full methodology — worth reading if you want to understand how they structured the AI's hypothesis generation process and what guardrails prevented the model from just pattern-matching on known TEMPO literature.
  • This was GPT-5.4, not a chemistry-specific fine-tune. If you've been assuming you need specialized models for scientific reasoning tasks, this result suggests the current frontier generalist models are worth evaluating first.

Further reading

🌿

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.