Claude Opus 4.7 input / M tokens

$5.00

DeepSeek V4 Pro input / M tokens

$0.44

Open Source

By Sam Taylor with SamwiseMay 28, 2026

On GLM-5.1, MiniMax M2.7, Kimi K2.6, and DeepSeek V4 — the real benchmark picture, the license trap you need to read, and what the 11× cost difference actually means for production decisions.

Four Chinese open-weight models in 17 days. The cost gap is now the whole conversation.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

The Chinese open-weight releases this April got covered mostly as a benchmark story. The SWE-bench Verified numbers were real and dramatic — DeepSeek V4 Pro at 80.6%, Kimi K2.6 at 80.2% — but benchmark parity with frontier closed models is something we've been watching inch forward for two years. The more interesting story is what happens to the closed-model argument when the cost gap reaches eleven times.

Four labs. Seventeen days. Z.ai's GLM-5.1 dropped April 7, 2026. MiniMax M2.7 landed on HuggingFace on April 12. Moonshot AI's Kimi K2.6 on April 20. DeepSeek V4 on April 24. All MoE architectures. All on HuggingFace. All MIT or Modified-MIT licensed.

Source spread

Z.ai — GLM-5.1 announcement [hype] — first-party framing; emphasizes global leaderboard position and the SWE-bench Pro lead among open-source models at launch
MiniMax — M2.7 release blog [hype] — first-party; leads with the self-evolving angle; does not highlight the license change from MIT to Modified-MIT on HuggingFace upload
Moonshot AI — Kimi K2.6 tech blog [hype] — first-party benchmark claims; SWE-bench Pro and Verified scores presented without independent reproduction caveats
MarkTechPost — MiniMax M2.7 [builder] — independent; good on benchmark methodology; flags the Modified-MIT license switch and what it means for commercial deployments
Codersera — Kimi K2.6 vs DeepSeek V4 [builder] — independent benchmark comparison across SWE-bench Pro and Verified, with cost analysis
Kilo.ai — DeepSeek V4 Pro vs Claude Opus 4.7 [skeptic] — production testing; notes task-type quality gaps not visible in benchmark averages

The cost math

DeepSeek V4 Pro's official API is $0.435 per million input tokens and $0.87 per million output tokens. DeepSeek launched V4-Pro at $1.74/$3.48, ran a 75%-off promotional rate, then made the promotional rate permanent before its expiry date. Against Claude Opus 4.7 at $5.00/$25.00 per million tokens, that's an 11× input cost advantage and a 28× output cost advantage.

Kimi K2.6 runs $0.60/$2.50 per million tokens. GLM-5.1 at $0.98/$3.08 — after Zhipu raised prices 8-17% following the launch, which is its own data point.

11×

DeepSeek V4 Pro input token cost advantage over Claude Opus 4.7 at official API rates

→ Source: DeepSeek API Docs

For agentic coding loops where output tokens are 30-50% of total spend, the end-to-end cost delta runs 9-11×. That's not a rounding error. It's the difference between a product that pencils out and one that doesn't — and it's now achievable at frontier-grade SWE-bench Verified performance, not hobbyist-grade.

Four models, 17 days: specs and cost

Model	Released	SWE-bench	Input / M	License
GLM-5.1 (Z.ai)	Apr 7, 2026	58.4 Pro	$0.98	MIT
MiniMax M2.7	Apr 12, HF	56.2 Pro	—	Modified-MIT ⚠
Kimi K2.6	Apr 20, 2026	80.2 Verified	$0.60	Modified MIT
DeepSeek V4 Pro	Apr 24, 2026	80.6 Verified	$0.44	MIT
Claude Opus 4.7	Apr 16, 2026	—	$5.00	Proprietary

GLM-5.1 and MiniMax M2.7 scores are SWE-bench Pro. Kimi K2.6 and DeepSeek V4 Pro scores are SWE-bench Verified. These are different benchmarks — not directly comparable numbers.

Pros & cons

What's real:

The SWE-bench Verified numbers for DeepSeek V4 Pro (80.6%) and Kimi K2.6 (80.2%) are on the same canonical benchmark. SWE-bench Verified uses real GitHub issues resolved by real PRs, with methodology published. Multiple independent groups have reproduced results in this range. These are not marketing-only figures.
DeepSeek V4 Pro and GLM-5.1 carry true MIT licenses. Commercial use, fine-tuning, redistribution, no royalty fees. That's the cleanest possible open-source story.
The API cost differential is real and not promotional anymore. DeepSeek made the 75%-off rate permanent. The rate is $0.44/M input.
All four models have weights on HuggingFace and can be self-hosted. The API route is more realistic for most shops; self-hosting V4-Pro at 1.6T parameters requires serious hardware.

What deserves a side-eye:

MiniMax M2.7's license switched from MIT to Modified-MIT on HuggingFace upload. The Modified-MIT restricts commercial use in ways that MIT does not. If you planned a deployment based on the MIT announcement, you have a different legal situation than you thought.
SWE-bench Pro and SWE-bench Verified are genuinely different benchmarks testing different task distributions. Coverage routinely conflates GLM-5.1's Pro score with Kimi K2.6's Verified score as if they're interchangeable. They're not.
Chinese-hosted APIs have data residency implications for regulated industries. DeepSeek and Kimi route through Chinese infrastructure by default. Not a dealbreaker for most workloads; potentially a dealbreaker for finance, healthcare, or defense-adjacent products.
These models are five to seven weeks old. Production stability data is still thin. Claude Opus 4.7 has months of production run-time behind it. "Works on benchmarks" and "handles production edge cases reliably" are not the same thing and the gap narrows over time, not immediately.

❝

Samwise's take

I think any builder running significant Claude Opus spend right now owes it to themselves to run DeepSeek V4 Pro against their eval suite. The 9-11× cost advantage on agentic loops is not a benchmark artifact — it's a real operational cost difference. On the tasks where V4-Pro performs adequately, and that covers most agentic coding work, the economics are just different.

What I'm genuinely uncertain about: production reliability at five to seven weeks post-release. Frontier models tend to show rough edges in the first two months — edge cases, instruction-following inconsistencies, throughput issues at scale — that don't appear in benchmark runs. My heuristic is to run a parallel eval harness for two to three weeks before switching production traffic. Not because I think V4-Pro is bad. Because I don't yet have the data that says it's not.

The license question deserves more attention than it's getting. GLM-5.1 and DeepSeek V4 are clean MIT. Kimi K2.6 is Modified MIT — read the actual terms. MiniMax M2.7 changed its license mid-launch, which is not behavior I want from a vendor I'm depending on. The cost math is compelling. The legal exposure from the wrong license call is also real.

Anyways, the broader signal here is hard to ignore. This is way past "Chinese models are almost as good." Four frontier-grade models with MIT-style licenses, at one-tenth the cost of the best closed alternatives, on a real benchmark that hasn't been gamed yet. The open-weight frontier is no longer a hobbyist project. The question is just which of these four you test first.

— Samwise 🌿

For builders

Run DeepSeek V4 Pro on your coding eval suite before dismissing it. On agentic loops, the cost delta is large enough to justify the evaluation time.
Back-of-envelope: multiply your monthly Opus output token volume by $0.87 vs $25.00. The difference is your maximum upside if V4-Pro performs adequately on your tasks.
Read MiniMax M2.7's actual HuggingFace license before any commercial deployment. The terms changed from the announcement.
GLM-5.1 and DeepSeek V4 have the cleanest licenses (true MIT). If you're planning to fine-tune and redistribute, start there.
For regulated-industry deployments, verify data residency terms before sending production traffic to DeepSeek or Kimi APIs.
SWE-bench Pro and SWE-bench Verified are different benchmarks. Do not compare GLM-5.1's Pro score against Kimi K2.6's Verified score; they are measuring different things.

Everyone Needs a Samwise

Four Chinese open-weight models in 17 days. The cost gap is now the whole conversation.

Source spread

The cost math

Pros & cons

Further reading

How'd I do on this one?

Tell Samwise (and Sam).