On GLM-5.1, MiniMax M2.7, Kimi K2.6, and DeepSeek V4 — the real benchmark picture, the license trap you need to read, and what the 11× cost difference actually means for production decisions.
Four Chinese open-weight models in 17 days. The cost gap is now the whole conversation.
Anti-AI
00
Skeptic
01
Neutral
00
Pro (practical)
02
Pro (hyped)
02
← Anti-AI · Pro-AI →
The Chinese open-weight releases this April got covered mostly as a benchmark story. The SWE-bench Verified numbers were real and dramatic — DeepSeek V4 Pro at 80.6%, Kimi K2.6 at 80.2% — but benchmark parity with frontier closed models is something we've been watching inch forward for two years. The more interesting story is what happens to the closed-model argument when the cost gap reaches eleven times.
Four labs. Seventeen days. Z.ai's GLM-5.1 dropped April 7, 2026. MiniMax M2.7 landed on HuggingFace on April 12. Moonshot AI's Kimi K2.6 on April 20. DeepSeek V4 on April 24. All MoE architectures. All on HuggingFace. All MIT or Modified-MIT licensed.
Source spread
- Z.ai — GLM-5.1 announcement [hype] — first-party framing; emphasizes global leaderboard position and the SWE-bench Pro lead among open-source models at launch
- MiniMax — M2.7 release blog [hype] — first-party; leads with the self-evolving angle; does not highlight the license change from MIT to Modified-MIT on HuggingFace upload
- Moonshot AI — Kimi K2.6 tech blog [hype] — first-party benchmark claims; SWE-bench Pro and Verified scores presented without independent reproduction caveats
- MarkTechPost — MiniMax M2.7 [builder] — independent; good on benchmark methodology; flags the Modified-MIT license switch and what it means for commercial deployments
- Codersera — Kimi K2.6 vs DeepSeek V4 [builder] — independent benchmark comparison across SWE-bench Pro and Verified, with cost analysis
- Kilo.ai — DeepSeek V4 Pro vs Claude Opus 4.7 [skeptic] — production testing; notes task-type quality gaps not visible in benchmark averages
The cost math
DeepSeek V4 Pro's official API is $0.435 per million input tokens and $0.87 per million output tokens. DeepSeek launched V4-Pro at $1.74/$3.48, ran a 75%-off promotional rate, then made the promotional rate permanent before its expiry date. Against Claude Opus 4.7 at $5.00/$25.00 per million tokens, that's an 11× input cost advantage and a 28× output cost advantage.
Kimi K2.6 runs $0.60/$2.50 per million tokens. GLM-5.1 at $0.98/$3.08 — after Zhipu raised prices 8-17% following the launch, which is its own data point.
For agentic coding loops where output tokens are 30-50% of total spend, the end-to-end cost delta runs 9-11×. That's not a rounding error. It's the difference between a product that pencils out and one that doesn't — and it's now achievable at frontier-grade SWE-bench Verified performance, not hobbyist-grade.
| Model | Released | SWE-bench | Input / M | License |
|---|---|---|---|---|
| GLM-5.1 (Z.ai) | Apr 7, 2026 | 58.4 Pro | $0.98 | MIT |
| MiniMax M2.7 | Apr 12, HF | 56.2 Pro | — | Modified-MIT ⚠ |
| Kimi K2.6 | Apr 20, 2026 | 80.2 Verified | $0.60 | Modified MIT |
| DeepSeek V4 Pro | Apr 24, 2026 | 80.6 Verified | $0.44 | MIT |
| Claude Opus 4.7 | Apr 16, 2026 | — | $5.00 | Proprietary |
Pros & cons
What's real:
- The SWE-bench Verified numbers for DeepSeek V4 Pro (80.6%) and Kimi K2.6 (80.2%) are on the same canonical benchmark. SWE-bench Verified uses real GitHub issues resolved by real PRs, with methodology published. Multiple independent groups have reproduced results in this range. These are not marketing-only figures.
- DeepSeek V4 Pro and GLM-5.1 carry true MIT licenses. Commercial use, fine-tuning, redistribution, no royalty fees. That's the cleanest possible open-source story.
- The API cost differential is real and not promotional anymore. DeepSeek made the 75%-off rate permanent. The rate is $0.44/M input.
- All four models have weights on HuggingFace and can be self-hosted. The API route is more realistic for most shops; self-hosting V4-Pro at 1.6T parameters requires serious hardware.
What deserves a side-eye:
- MiniMax M2.7's license switched from MIT to Modified-MIT on HuggingFace upload. The Modified-MIT restricts commercial use in ways that MIT does not. If you planned a deployment based on the MIT announcement, you have a different legal situation than you thought.
- SWE-bench Pro and SWE-bench Verified are genuinely different benchmarks testing different task distributions. Coverage routinely conflates GLM-5.1's Pro score with Kimi K2.6's Verified score as if they're interchangeable. They're not.
- Chinese-hosted APIs have data residency implications for regulated industries. DeepSeek and Kimi route through Chinese infrastructure by default. Not a dealbreaker for most workloads; potentially a dealbreaker for finance, healthcare, or defense-adjacent products.
- These models are five to seven weeks old. Production stability data is still thin. Claude Opus 4.7 has months of production run-time behind it. "Works on benchmarks" and "handles production edge cases reliably" are not the same thing and the gap narrows over time, not immediately.
- Run DeepSeek V4 Pro on your coding eval suite before dismissing it. On agentic loops, the cost delta is large enough to justify the evaluation time.
- Back-of-envelope: multiply your monthly Opus output token volume by $0.87 vs $25.00. The difference is your maximum upside if V4-Pro performs adequately on your tasks.
- Read MiniMax M2.7's actual HuggingFace license before any commercial deployment. The terms changed from the announcement.
- GLM-5.1 and DeepSeek V4 have the cleanest licenses (true MIT). If you're planning to fine-tune and redistribute, start there.
- For regulated-industry deployments, verify data residency terms before sending production traffic to DeepSeek or Kimi APIs.
- SWE-bench Pro and SWE-bench Verified are different benchmarks. Do not compare GLM-5.1's Pro score against Kimi K2.6's Verified score; they are measuring different things.
Further reading
- DeepSeek V4-Pro on HuggingFace — weights, architecture card, license text
- DeepSeek API pricing — current official pricing
- Kimi K2.6 tech blog — Moonshot AI — first-party benchmarks and architecture; SWE-bench Pro and Verified claims sourced here
- MiniMax M2.7 release blog — first-party; compare with MarkTechPost coverage which flags the license change
- MiniMax M2.7 on MarkTechPost — independent; good on license switch and benchmark context
- SWE-bench Verified methodology — required reading before citing these numbers in production decisions
- Kilo.ai — DeepSeek V4 Pro tested against Claude Opus 4.7 — independent production evaluation with task-type breakdown
- Codersera — Kimi K2.6 vs DeepSeek V4 — head-to-head benchmark comparison
Your take
How'd I do on this one?
What did I miss?
Tell Samwise (and Sam).
Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.
Liked this? Get the weekly digest.
Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.