Vol. 1 · Edition 027Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

GPT-5.6 FAMILY

Sol
Terra
Luna
Model Launch
By Sam Taylor with Samwise

On Sol, Terra, and Luna's capabilities and pricing, whether Terminal-Bench 2.1 is the right number to cite, and why two government-constrained frontier launches in thirty days is a policy signal worth naming.

GPT-5.6 Sol is OpenAI's strongest work yet. The government decides who gets in first.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

01

Neutral

01

Pro (practical)

00

Pro (hyped)

01

← Anti-AI · Pro-AI →

OpenAI unveiled GPT-5.6 on June 26 with three tiers. Sol, the flagship. Terra, balanced for high-volume business work. Luna, fast and cheap. On Terminal-Bench 2.1, a command-line coding workflow benchmark, Sol Ultra hits 91.9% compared to GPT-5.5 at 83.4% and Claude Mythos 5 at 88.0%. The naming shift matters too: the number (5.6) marks the generation, the name (Sol/Terra/Luna) marks the capability tier — a cleaner structure for a lab shipping fast.

Pricing is competitive. Luna lands at $1 input / $6 output per million tokens, which puts it in sub-$2 territory alongside the Chinese open-weight models that have been eating at this price point for months. Terra is $2.50/$15. Sol is $5/$30. The new features on Sol are real: an "ultra mode" that orchestrates subagents for complex tasks, a new "max reasoning effort" level, and explicit cache breakpoints with a 30-minute minimum cache life.

But you probably can't access any of them yet. At the request of the US government, OpenAI held back a broad launch. Roughly 20 government-vetted trusted partners have access via API and Codex only. ChatGPT doesn't have these models. Sam Altman said the company had originally planned a wider release but shifted course when the government asked. Expected broad availability: mid-July, two to three weeks out.

One thing I want to name directly: this is the second frontier-AI launch gated by the US government in 30 days. Anthropic's Fable 5 and Mythos 5 were pulled under an export control directive on June 12. Now GPT-5.6 launches into a restricted preview, also at government request. Two labs. Two launches. One waiting room. Whatever framework is forming here is forming in real time.

91.9%
Sol Ultra score on Terminal-Bench 2.1 command-line coding — vs GPT-5.5 at 83.4% and Claude Mythos 5 at 88.0%

→ Source: OpenAI preview announcement

Source spread

Pros & cons

What's real:

  • The three-tier structure is clear and useful. Sol/Terra/Luna gives builders an actual choice, not just "use the big model or roll your own routing." Luna at $1/$6 is a genuine high-throughput option.
  • Sol's benchmark lead over GPT-5.5 (83.4% → 91.9% on Terminal-Bench 2.1) and its edge over Claude Mythos 5 at 88.0% is meaningful, if the benchmark holds up under independent testing.
  • Ultra mode subagent orchestration is an API-level feature, not a product-layer trick. Builders running complex multi-step agentic workflows can use it directly.
  • Cache breakpoints and 30-minute minimum cache life are the kind of infrastructure details that save real money at scale. OpenAI improved something annoying and unglamorous.

What deserves a side-eye:

  • Terminal-Bench 2.1 appears to be first-party. Every number I have came through OpenAI's own announcement. There are no published SWE-bench Verified or TAU-bench numbers for GPT-5.6 Sol yet. Those matter for comparing against Mythos 5 or Opus 4.8 on tasks builders actually run.
  • "20 trusted partners" is not a public program. The pathway to early access isn't published. You can't apply.
  • The mid-July timeline is an expectation, not a commitment. This is a government-paced review. OpenAI doesn't control the calendar.
  • OpenAI specifically marketed Sol as its "most capable cybersecurity model to date" for vulnerability research and exploitation tasks. That framing, combined with the government restriction, suggests the capability ceiling here is why access is being managed.
GPT-5.6 tier comparison
ModelBest use caseInput $/MOutput $/M
SolComplex coding, security research$5.00$30.00
TerraHigh-volume tasks, document analysis$2.50$15.00
LunaSummarization, drafting, automation$1.00$6.00

What builders need to know

  • You're waiting until mid-July at the earliest. Plan roadmaps accordingly. If you need Sol-level capability now, you're either one of 20 partners or you're waiting.
  • Luna ($1/$6) directly competes with sub-$1 Chinese open-weight models in the high-throughput tier. When access opens, run a head-to-head eval against whatever you're currently routing to at that price point.
  • Terra ($2.50/$15) undercuts Claude Opus 4.8's standard rate. If you're already running mid-tier Anthropic, Terra is worth benchmarking against your actual workloads.
  • Sol ultra mode is subagent orchestration at the API level — this is not a ChatGPT playground feature. For builders running complex multi-step agentic workflows, this is the capability to test first.
  • If your work involves cybersecurity, vulnerability research, or exploitation-adjacent tasks, Sol was specifically marketed for these use cases. You're a candidate for the trusted-partner program — flag it. You'd also rather be inside a government review process than outside it.
  • Wait for independent SWE-bench Verified numbers before making production decisions based on the Terminal-Bench 2.1 claims. First-party benchmarks are a starting point, not a verdict.

Further reading

🌿

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.