Vol. 1 · Edition 023Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

−95%

GPT-4 '23GPT-4o '24Mistral M3.5 '26DeepSeek V4 '26
Industry
By Sam Taylor with Samwise

On Mistral Medium 3.5 at $1.50/M, Claude Opus 4.8's efficiency gains, DeepSeek V4 under a dollar, and why routing is now the engineering problem worth solving.

The mid-tier won. Most builder cost models are still pricing for the frontier.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

01

Neutral

00

Pro (practical)

02

Pro (hyped)

00

← Anti-AI · Pro-AI →

Three things happened in the last five weeks that each got covered separately but together tell a different story.

Mistral Medium 3.5 launched at $1.50 per million input tokens — open weights, API access, 128B parameters, near-frontier coding benchmarks. DeepSeek V4 came out at sub-$1/M, one of four Chinese open-weight models that shipped in seventeen days. And Claude Opus 4.8 landed with 35% fewer output tokens per task than Opus 4.7 and a fast mode at $10/$50 per million tokens — down from $30/$150 on 4.7's equivalent.

Now go back to 2023. GPT-4 launched at $30 per million input tokens. The model that cost $30/M then has functional equivalents available today for $1.50. That's a 95% cost reduction in 38 months for roughly comparable capability on most production task types.

Most cost models built in 2024 don't reflect this. The default planning assumption in a lot of production budgets I've seen is still "we'll use a frontier model, budget accordingly." That was reasonable eighteen months ago. It's getting expensive to maintain.

Source spread

Frontier and near-frontier API pricing, May 2026
ModelInput ($/M tokens)Output ($/M tokens)Notes
Claude Opus 4.8 standard$5.00$25.00Unchanged from 4.7; 35% fewer tokens/task
Claude Opus 4.8 fast mode$10.00$50.002.5× speed vs standard; down from $30/$150 on 4.7
Mistral Medium 3.5$1.50$7.50Open weights + API; 128B dense
DeepSeek V4 (API)< $1.00< $3.00China-hosted only — data residency risk
Sources: Anthropic, Mistral AI, Hugging Face model cards. Claude fast mode pricing per Anthropic announcement.

Pros & cons

What's real:

  • Near-frontier quality is accessible without self-hosting. Mistral Medium 3.5 at $1.50/M is the first time an open-weight model's API price is competitive with self-hosted setups for teams doing under roughly 5B tokens/month. The break-even math has moved.
  • Claude Opus 4.8's efficiency gains change the effective cost even at unchanged token prices. Thirty-five percent fewer output tokens per task means the $5/M standard price gets you more work done than it did on 4.7. Token price is one variable; task completion rate is the other.
  • The mid-tier fight is where the real competition is. The $1–$5/M range has genuine options now — not just "is there a cheap model" but "is there a cheap model good enough for my specific task."

What deserves skepticism:

  • Vendor quality claims need validation on your specific tasks. "Near-frontier quality" at $1.50/M is a benchmark claim, not a production claim. Which benchmark, on which task type, under what conditions, matters a lot.
  • Price compression doesn't linearly reduce product cost. Inference is one line item. Engineering, support, failure-mode correction, and latency still grow with usage and aren't reflected in the per-token price.
  • DeepSeek V4 at sub-$1/M has a real asterisk: China-hosted infrastructure, data residency questions, and the availability track record of the Chinese open-source providers during high-traffic periods. For US-hosted B2B products, factor those in.
For builders
  • Re-run your production cost model against current prices. Estimates built in Q4 2024 or early 2025 are off by 2–4× for a tier-1 workload before you account for efficiency gains.
  • Benchmark Mistral Medium 3.5 against your specific task types before committing to Opus-class for everything. The quality gap is real on hard reasoning; it's narrow to nonexistent on document extraction, summarization, and structured output.
  • Claude Opus 4.8's 35% token reduction per task makes the effective cost of standard mode lower than the $5/M headline suggests. Factor task completion rate, not just token price, into cost estimates.
  • DeepSeek V4 is worth testing for non-US deployments where data residency isn't a constraint. It is not a drop-in for US-hosted B2B products with compliance requirements.
  • Build per-route cost models, not per-product ones. Your auth-context summarizer doesn't need the same model as your long-form reasoning agent.

Further reading

🌿

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.