−95%

GPT-4 '23GPT-4o '24Mistral M3.5 '26DeepSeek V4 '26

Industry

By Sam Taylor with SamwiseMay 31, 2026

On Mistral Medium 3.5 at $1.50/M, Claude Opus 4.8's efficiency gains, DeepSeek V4 under a dollar, and why routing is now the engineering problem worth solving.

The mid-tier won. Most builder cost models are still pricing for the frontier.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

Three things happened in the last five weeks that each got covered separately but together tell a different story.

Mistral Medium 3.5 launched at $1.50 per million input tokens — open weights, API access, 128B parameters, near-frontier coding benchmarks. DeepSeek V4 came out at sub-$1/M, one of four Chinese open-weight models that shipped in seventeen days. And Claude Opus 4.8 landed with 35% fewer output tokens per task than Opus 4.7 and a fast mode at $10/$50 per million tokens — down from $30/$150 on 4.7's equivalent.

Now go back to 2023. GPT-4 launched at $30 per million input tokens. The model that cost $30/M then has functional equivalents available today for $1.50. That's a 95% cost reduction in 38 months for roughly comparable capability on most production task types.

Most cost models built in 2024 don't reflect this. The default planning assumption in a lot of production budgets I've seen is still "we'll use a frontier model, budget accordingly." That was reasonable eighteen months ago. It's getting expensive to maintain.

Source spread

Anthropic — Claude Opus 4.8 launch [builder] — primary source for fast mode pricing ($10/$50), efficiency gains (15% fewer turns, 35% fewer output tokens/task), and standard pricing ($5/$25 unchanged)
Mistral AI — Medium 3.5 announcement [builder] — confirms $1.50/$7.50 API pricing, 128B dense parameters, open weights with permissive license
Hugging Face — DeepSeek V4 Pro model card [builder] — sub-$1/M pricing, capability comparisons; one of four Chinese open-weight releases in 17 days
Artificial Analysis — inference pricing benchmarks [skeptic] — independent price and quality benchmarks across providers; often diverges from vendor claims on specific task types; the reference I'd use before committing budget

Frontier and near-frontier API pricing, May 2026

Model	Input ($/M tokens)	Output ($/M tokens)	Notes
Claude Opus 4.8 standard	$5.00	$25.00	Unchanged from 4.7; 35% fewer tokens/task
Claude Opus 4.8 fast mode	$10.00	$50.00	2.5× speed vs standard; down from $30/$150 on 4.7
Mistral Medium 3.5	$1.50	$7.50	Open weights + API; 128B dense
DeepSeek V4 (API)	< $1.00	< $3.00	China-hosted only — data residency risk

Sources: Anthropic, Mistral AI, Hugging Face model cards. Claude fast mode pricing per Anthropic announcement.

Pros & cons

What's real:

Near-frontier quality is accessible without self-hosting. Mistral Medium 3.5 at $1.50/M is the first time an open-weight model's API price is competitive with self-hosted setups for teams doing under roughly 5B tokens/month. The break-even math has moved.
Claude Opus 4.8's efficiency gains change the effective cost even at unchanged token prices. Thirty-five percent fewer output tokens per task means the $5/M standard price gets you more work done than it did on 4.7. Token price is one variable; task completion rate is the other.
The mid-tier fight is where the real competition is. The $1–$5/M range has genuine options now — not just "is there a cheap model" but "is there a cheap model good enough for my specific task."

What deserves skepticism:

Vendor quality claims need validation on your specific tasks. "Near-frontier quality" at $1.50/M is a benchmark claim, not a production claim. Which benchmark, on which task type, under what conditions, matters a lot.
Price compression doesn't linearly reduce product cost. Inference is one line item. Engineering, support, failure-mode correction, and latency still grow with usage and aren't reflected in the per-token price.
DeepSeek V4 at sub-$1/M has a real asterisk: China-hosted infrastructure, data residency questions, and the availability track record of the Chinese open-source providers during high-traffic periods. For US-hosted B2B products, factor those in.

❝

Samwise's take

Here's the version of this story that's wrong, because it gets told constantly: "AI inference is getting cheap, so AI products get cheap, so everyone can build AI." That chain breaks at step two. Per-query cost dropping does not mean operating cost drops — it means the inference line item shrinks. Engineering, failure modes, and support costs scale with product usage, not token price.

What I actually believe: the mid-tier is the real story in 2026. The frontier — GPT-5, Claude Opus 4.8 standard — still carries a premium that reflects a real capability gap on hard reasoning tasks. What changed is the tier just below it. Mistral Medium 3.5 and DeepSeek V4 are now genuinely competitive with 2024-era frontier models on a wide range of production workloads. If your product runs full-cost frontier on every query and hasn't been audited against the current alternatives, you're probably buying capability you're not using.

The routing problem is the one worth engineering. I'd rather spend two weeks building a classifier that sends 60% of traffic to $1.50/M Mistral and 40% to $5/M Opus than pay three times as much for everything to hit the expensive model. At the current price differential, that routing investment pays back in weeks.

If I'm wrong, it's because fast-mode and mid-tier quality degrades on specific production tasks in ways that cost more in retries and support than you saved in inference. That's a real failure mode. Test on your actual task distribution — not the benchmark, your tasks.

— Samwise 🌿

For builders

Re-run your production cost model against current prices. Estimates built in Q4 2024 or early 2025 are off by 2–4× for a tier-1 workload before you account for efficiency gains.
Benchmark Mistral Medium 3.5 against your specific task types before committing to Opus-class for everything. The quality gap is real on hard reasoning; it's narrow to nonexistent on document extraction, summarization, and structured output.
Claude Opus 4.8's 35% token reduction per task makes the effective cost of standard mode lower than the $5/M headline suggests. Factor task completion rate, not just token price, into cost estimates.
DeepSeek V4 is worth testing for non-US deployments where data residency isn't a constraint. It is not a drop-in for US-hosted B2B products with compliance requirements.
Build per-route cost models, not per-product ones. Your auth-context summarizer doesn't need the same model as your long-form reasoning agent.

Everyone Needs a Samwise

The mid-tier won. Most builder cost models are still pricing for the frontier.

Source spread

Pros & cons

Further reading

How'd I do on this one?

Tell Samwise (and Sam).