Vol. 1 · Edition 023Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

Claude Opus 4.7 input / M tokens

$5.00

DeepSeek V4 Pro input / M tokens

$0.44
Open Source
By Sam Taylor with Samwise

On GLM-5.1, MiniMax M2.7, Kimi K2.6, and DeepSeek V4 — the real benchmark picture, the license trap you need to read, and what the 11× cost difference actually means for production decisions.

Four Chinese open-weight models in 17 days. The cost gap is now the whole conversation.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

01

Neutral

00

Pro (practical)

02

Pro (hyped)

02

← Anti-AI · Pro-AI →

The Chinese open-weight releases this April got covered mostly as a benchmark story. The SWE-bench Verified numbers were real and dramatic — DeepSeek V4 Pro at 80.6%, Kimi K2.6 at 80.2% — but benchmark parity with frontier closed models is something we've been watching inch forward for two years. The more interesting story is what happens to the closed-model argument when the cost gap reaches eleven times.

Four labs. Seventeen days. Z.ai's GLM-5.1 dropped April 7, 2026. MiniMax M2.7 landed on HuggingFace on April 12. Moonshot AI's Kimi K2.6 on April 20. DeepSeek V4 on April 24. All MoE architectures. All on HuggingFace. All MIT or Modified-MIT licensed.

Source spread

The cost math

DeepSeek V4 Pro's official API is $0.435 per million input tokens and $0.87 per million output tokens. DeepSeek launched V4-Pro at $1.74/$3.48, ran a 75%-off promotional rate, then made the promotional rate permanent before its expiry date. Against Claude Opus 4.7 at $5.00/$25.00 per million tokens, that's an 11× input cost advantage and a 28× output cost advantage.

Kimi K2.6 runs $0.60/$2.50 per million tokens. GLM-5.1 at $0.98/$3.08 — after Zhipu raised prices 8-17% following the launch, which is its own data point.

11×
DeepSeek V4 Pro input token cost advantage over Claude Opus 4.7 at official API rates

→ Source: DeepSeek API Docs

For agentic coding loops where output tokens are 30-50% of total spend, the end-to-end cost delta runs 9-11×. That's not a rounding error. It's the difference between a product that pencils out and one that doesn't — and it's now achievable at frontier-grade SWE-bench Verified performance, not hobbyist-grade.

Four models, 17 days: specs and cost
ModelReleasedSWE-benchInput / MLicense
GLM-5.1 (Z.ai)Apr 7, 202658.4 Pro$0.98MIT
MiniMax M2.7Apr 12, HF56.2 ProModified-MIT ⚠
Kimi K2.6Apr 20, 202680.2 Verified$0.60Modified MIT
DeepSeek V4 ProApr 24, 202680.6 Verified$0.44MIT
Claude Opus 4.7Apr 16, 2026$5.00Proprietary
GLM-5.1 and MiniMax M2.7 scores are SWE-bench Pro. Kimi K2.6 and DeepSeek V4 Pro scores are SWE-bench Verified. These are different benchmarks — not directly comparable numbers.

Pros & cons

What's real:

  • The SWE-bench Verified numbers for DeepSeek V4 Pro (80.6%) and Kimi K2.6 (80.2%) are on the same canonical benchmark. SWE-bench Verified uses real GitHub issues resolved by real PRs, with methodology published. Multiple independent groups have reproduced results in this range. These are not marketing-only figures.
  • DeepSeek V4 Pro and GLM-5.1 carry true MIT licenses. Commercial use, fine-tuning, redistribution, no royalty fees. That's the cleanest possible open-source story.
  • The API cost differential is real and not promotional anymore. DeepSeek made the 75%-off rate permanent. The rate is $0.44/M input.
  • All four models have weights on HuggingFace and can be self-hosted. The API route is more realistic for most shops; self-hosting V4-Pro at 1.6T parameters requires serious hardware.

What deserves a side-eye:

  • MiniMax M2.7's license switched from MIT to Modified-MIT on HuggingFace upload. The Modified-MIT restricts commercial use in ways that MIT does not. If you planned a deployment based on the MIT announcement, you have a different legal situation than you thought.
  • SWE-bench Pro and SWE-bench Verified are genuinely different benchmarks testing different task distributions. Coverage routinely conflates GLM-5.1's Pro score with Kimi K2.6's Verified score as if they're interchangeable. They're not.
  • Chinese-hosted APIs have data residency implications for regulated industries. DeepSeek and Kimi route through Chinese infrastructure by default. Not a dealbreaker for most workloads; potentially a dealbreaker for finance, healthcare, or defense-adjacent products.
  • These models are five to seven weeks old. Production stability data is still thin. Claude Opus 4.7 has months of production run-time behind it. "Works on benchmarks" and "handles production edge cases reliably" are not the same thing and the gap narrows over time, not immediately.
For builders
  • Run DeepSeek V4 Pro on your coding eval suite before dismissing it. On agentic loops, the cost delta is large enough to justify the evaluation time.
  • Back-of-envelope: multiply your monthly Opus output token volume by $0.87 vs $25.00. The difference is your maximum upside if V4-Pro performs adequately on your tasks.
  • Read MiniMax M2.7's actual HuggingFace license before any commercial deployment. The terms changed from the announcement.
  • GLM-5.1 and DeepSeek V4 have the cleanest licenses (true MIT). If you're planning to fine-tune and redistribute, start there.
  • For regulated-industry deployments, verify data residency terms before sending production traffic to DeepSeek or Kimi APIs.
  • SWE-bench Pro and SWE-bench Verified are different benchmarks. Do not compare GLM-5.1's Pro score against Kimi K2.6's Verified score; they are measuring different things.

Further reading

🌿

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.