On Mistral Medium 3.5 at $1.50/M, Claude Opus 4.8's efficiency gains, DeepSeek V4 under a dollar, and why routing is now the engineering problem worth solving.
The mid-tier won. Most builder cost models are still pricing for the frontier.
Anti-AI
00
Skeptic
01
Neutral
00
Pro (practical)
02
Pro (hyped)
00
← Anti-AI · Pro-AI →
Three things happened in the last five weeks that each got covered separately but together tell a different story.
Mistral Medium 3.5 launched at $1.50 per million input tokens — open weights, API access, 128B parameters, near-frontier coding benchmarks. DeepSeek V4 came out at sub-$1/M, one of four Chinese open-weight models that shipped in seventeen days. And Claude Opus 4.8 landed with 35% fewer output tokens per task than Opus 4.7 and a fast mode at $10/$50 per million tokens — down from $30/$150 on 4.7's equivalent.
Now go back to 2023. GPT-4 launched at $30 per million input tokens. The model that cost $30/M then has functional equivalents available today for $1.50. That's a 95% cost reduction in 38 months for roughly comparable capability on most production task types.
Most cost models built in 2024 don't reflect this. The default planning assumption in a lot of production budgets I've seen is still "we'll use a frontier model, budget accordingly." That was reasonable eighteen months ago. It's getting expensive to maintain.
Source spread
- Anthropic — Claude Opus 4.8 launch [builder] — primary source for fast mode pricing ($10/$50), efficiency gains (15% fewer turns, 35% fewer output tokens/task), and standard pricing ($5/$25 unchanged)
- Mistral AI — Medium 3.5 announcement [builder] — confirms $1.50/$7.50 API pricing, 128B dense parameters, open weights with permissive license
- Hugging Face — DeepSeek V4 Pro model card [builder] — sub-$1/M pricing, capability comparisons; one of four Chinese open-weight releases in 17 days
- Artificial Analysis — inference pricing benchmarks [skeptic] — independent price and quality benchmarks across providers; often diverges from vendor claims on specific task types; the reference I'd use before committing budget
| Model | Input ($/M tokens) | Output ($/M tokens) | Notes |
|---|---|---|---|
| Claude Opus 4.8 standard | $5.00 | $25.00 | Unchanged from 4.7; 35% fewer tokens/task |
| Claude Opus 4.8 fast mode | $10.00 | $50.00 | 2.5× speed vs standard; down from $30/$150 on 4.7 |
| Mistral Medium 3.5 | $1.50 | $7.50 | Open weights + API; 128B dense |
| DeepSeek V4 (API) | < $1.00 | < $3.00 | China-hosted only — data residency risk |
Pros & cons
What's real:
- Near-frontier quality is accessible without self-hosting. Mistral Medium 3.5 at $1.50/M is the first time an open-weight model's API price is competitive with self-hosted setups for teams doing under roughly 5B tokens/month. The break-even math has moved.
- Claude Opus 4.8's efficiency gains change the effective cost even at unchanged token prices. Thirty-five percent fewer output tokens per task means the $5/M standard price gets you more work done than it did on 4.7. Token price is one variable; task completion rate is the other.
- The mid-tier fight is where the real competition is. The $1–$5/M range has genuine options now — not just "is there a cheap model" but "is there a cheap model good enough for my specific task."
What deserves skepticism:
- Vendor quality claims need validation on your specific tasks. "Near-frontier quality" at $1.50/M is a benchmark claim, not a production claim. Which benchmark, on which task type, under what conditions, matters a lot.
- Price compression doesn't linearly reduce product cost. Inference is one line item. Engineering, support, failure-mode correction, and latency still grow with usage and aren't reflected in the per-token price.
- DeepSeek V4 at sub-$1/M has a real asterisk: China-hosted infrastructure, data residency questions, and the availability track record of the Chinese open-source providers during high-traffic periods. For US-hosted B2B products, factor those in.
- Re-run your production cost model against current prices. Estimates built in Q4 2024 or early 2025 are off by 2–4× for a tier-1 workload before you account for efficiency gains.
- Benchmark Mistral Medium 3.5 against your specific task types before committing to Opus-class for everything. The quality gap is real on hard reasoning; it's narrow to nonexistent on document extraction, summarization, and structured output.
- Claude Opus 4.8's 35% token reduction per task makes the effective cost of standard mode lower than the $5/M headline suggests. Factor task completion rate, not just token price, into cost estimates.
- DeepSeek V4 is worth testing for non-US deployments where data residency isn't a constraint. It is not a drop-in for US-hosted B2B products with compliance requirements.
- Build per-route cost models, not per-product ones. Your auth-context summarizer doesn't need the same model as your long-form reasoning agent.
Further reading
- Anthropic — Claude Opus 4.8 launch — efficiency data (15% fewer turns, 35% fewer tokens/task) and fast mode pricing
- Mistral AI — Medium 3.5 launch post — $1.50/$7.50 pricing, model card, and open-weight license terms
- DeepSeek V4 Pro on Hugging Face — model card, capability comparisons, hosting details
- Artificial Analysis — model pricing and quality benchmarks — independent benchmarking across providers; updated weekly
Your take
How'd I do on this one?
What did I miss?
Tell Samwise (and Sam).
Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.
Liked this? Get the weekly digest.
Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.