Vol. 1 · Edition 023Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

Build 2026

Project Polaris
MAI-Thinking-1
WAF v1.0
Industry
By Sam Taylor with Samwise

On the in-house MoE coding model, the MAI family spanning image and reasoning, and why building your own AI the same month your key vendor files to go public is not a coincidence.

Project Polaris gives Microsoft a Copilot it built. That changes the OpenAI math.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

01

Neutral

00

Pro (practical)

02

Pro (hyped)

00

← Anti-AI · Pro-AI →

Copilot for three years has been a Microsoft product running on OpenAI's models. Build 2026, which opened this morning in San Francisco, is the announcement that puts a clock on that sentence.

Project Polaris is Microsoft's answer. In-house mixture-of-experts coding model. Runs on custom Maia AI accelerators inside Azure. Targeting GitHub Copilot GA by August 2026. Per Microsoft's internal benchmarks, it beats GPT-4 Turbo on HumanEval and MBPP — the field-standard code generation evaluations — with the largest gains in low-resource languages like Rust and Haskell.

Those benchmarks are Microsoft's. No independent reproduction exists at the time I'm writing this. First-party numbers on an unshipped model should be treated as marketing until someone outside the company confirms them. I'll keep saying that.

What I don't need third-party confirmation for: Polaris runs on Maia accelerators, inside Azure, on infrastructure Microsoft controls. That's an inference stack they own end-to-end. Real, regardless of whether the HumanEval number holds.

Polaris isn't the only announcement. Microsoft shipped MAI-Image 2.5, MAI-Voice 2, MAI-Transcribe 1.5, and MAI-Thinking-1 alongside it. That last one is the entry worth sitting with. MAI-Thinking-1 is Microsoft's first dedicated reasoning model. You don't build a reasoning model if you plan to keep buying reasoning from somewhere else.

Copilot was rearchitected as a multi-model platform. It now routes tasks across OpenAI, Anthropic, and open-source models depending on the workload. The framing is careful — Microsoft didn't announce "Copilot now runs on Polaris." They announced "Copilot routes to the best model for the job." Polaris is one option. OpenAI is still in there. So is Anthropic.

And Windows Agent Framework v1.0 shipped as a production MIT-licensed release — the formal convergence of AutoGen and Semantic Kernel into one SDK. Agents defined in YAML. Portable across local Windows, Windows 365, and Azure without changing the manifest. Cross-agent communication over gRPC. Memory service for persistent caching. The kind of infrastructure layer that takes eighteen months to appreciate.

Microsoft's AI independence milestones
  1. Apr 2, 2026

    Windows Agent Framework v1.0 ships

    MIT-licensed production release — AutoGen and Semantic Kernel formally merged.

  2. Apr 2026

    OpenAI exclusivity clause removed

    Revised Microsoft agreement lets OpenAI sell through Amazon and Google.

  3. Jun 2, 2026

    Build 2026: Polaris + full MAI family

    In-house coding model, reasoning model, plus image/voice/transcription models announced.

  4. Aug 2026

    Project Polaris GA in Copilot (target)

    20M+ paid Copilot users switch to a model Microsoft built and controls.

Source spread

Pros & cons

What's real:

  • Polaris on Maia accelerators inside Azure means Microsoft controls latency, pricing, and data residency for its most-used developer product. These operational advantages compound over time and don't show up in a benchmark.
  • Multi-model Copilot routing is architecturally correct. Different workloads have different cost and quality tradeoffs. Builders get routing flexibility they didn't have before.
  • MAI-Thinking-1 existing at all is the signal. You build a dedicated reasoning model when you're serious about the full stack — not just the code-completion tier.
  • WAF v1.0 being MIT-licensed means commercial production use is unambiguous. The YAML-based agent portability pattern is worth learning before the GA ecosystem hardens around it.

What deserves a side-eye:

  • The benchmark claim — "beats GPT-4 Turbo on HumanEval and MBPP" — is Microsoft's internal assessment on a model not yet in GA. No SWE-bench Verified. No TAU-bench. No third-party reproduction. Don't make production architecture decisions based on these numbers yet.
  • Polaris doesn't reach GA until August. A Build announcement and a shipping model are two different things, and the gap between them has historically included surprises.
  • Multi-model Copilot routing adds debugging surface area. If the routing logic changes which model handles your task, answers change in ways you may not have opted into.
For builders
  • Don't reroute production Copilot traffic to Polaris-specific patterns until it reaches GA in August. Watch the GitHub Changelog for rollout details.
  • Multi-model Copilot routing is live now. In VS Code, you can explicitly pick Claude or a specific OpenAI model for sessions where that tradeoff makes sense — check the Copilot model selector.
  • Windows Agent Framework v1.0 is MIT-licensed and production-ready today. The YAML agent definition and cross-environment portability pattern is worth prototyping before the broader GA ecosystem forms around it.
  • MAI-Thinking-1 is not yet in the Azure AI Foundry API. Watch the Azure AI Foundry changelog for GA date and pricing before building plans that depend on it.
  • The Polaris HumanEval/MBPP claims are first-party. Run your own coding evals against production Copilot once Polaris ships before making architectural decisions based on those numbers.

Further reading

🌿

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.