Microsoft Just Launched 7 In-House AI Models at Build 2026. Here’s What Each One Does

Seven models. One announcement. Zero OpenAI.

That’s what happened at Microsoft Build 2026 on June 2nd.

Microsoft didn’t just update a few tools. They launched an entirely in-house model family called MAI, covering reasoning, coding, image generation, transcription, and voice. All built from scratch. And they made a point of saying it directly: “We train from scratch. We don’t distill from other labs.”

That last part matters more than it sounds.

Microsoft has been OpenAI’s biggest financial backer since 2019, a $13 billion relationship that put GPT models inside Bing, Azure, Word, and basically everything Microsoft ships. But at Build 2026, they showed up with their own stack. Seven models deep. Built on their own Maia 200 silicon, co-designed in-house, with a claimed 1.4x efficiency advantage.

So what actually launched? Let’s go through all seven.

The 7 Microsoft MAI Models at a Glance

Before the deep dive, here’s the full list:

  • MAI-Thinking-1 – Reasoning and complex problem-solving
  • MAI-Code-1-Flash – Coding and developer tools
  • MAI-Image-2.5 – Text-to-image and image editing
  • MAI-Image-2.5-Flash – Lightweight image generation variant
  • MAI-Transcribe-1.5 – Speech-to-text across 43 languages
  • MAI-Voice-2 – Text-to-speech with voice cloning
  • MAI-Voice-2-Flash – Efficient voice generation (coming soon)

Now the breakdown.

MAI-Thinking-1: The Reasoning Model

This is the flagship. The one Microsoft leads with.

Key Specs

  • 35 billion active parameters
  • 256K context window (roughly 200,000 words in one session)
  • Mid-weight pricing tier
  • Available through: Foundry, OpenRouter, Fireworks, Baseten

What It Can Do

  • Handles complex multi-step reasoning problems
  • Strong long-context performance across coding and analysis tasks
  • Microsoft claims blind human evaluators preferred it over Claude Sonnet 4.6
  • Matches Claude Opus 4.6 on SWE-Bench Pro coding benchmarks

Those are aggressive claims for a first-generation reasoning model. But the benchmark numbers appear to back them up.

MAI-Code-1-Flash: The Developer Model

Small, fast, already deployed.

Key Specs

  • 5 billion active parameters
  • Lightweight agentic architecture
  • Positioned as cheaper than Anthropic’s Haiku
  • Live in GitHub Copilot and VS Code right now

What It Can Do

  • Agentic coding tasks inside the Microsoft developer stack
  • Inline code suggestions and completions
  • Optimized for speed, not raw reasoning power

If you code in VS Code or use GitHub Copilot, you’ve probably already used this model without knowing it.

MAI-Image-2.5 and MAI-Image-2.5-Flash: The Image Models

Split-screen AI image generation comparison showing a photorealistic luxury perfume bottle on marble beside a highly detailed illustrated fantasy warrior character in ornate blue and gold armor.

This is the most relevant model for anyone working with AI creative tools.

Key Specs

  • Handles text-to-image and image editing in one model
  • Arena AI leaderboard score: 1,254 (#3 overall)
  • #2 specifically for image editing tasks
  • Flash variant available for high-volume, lower-cost workloads
  • Live in PowerPoint, rolling out to OneDrive

Benchmark Improvements Over Previous Version

  • +107 points on text rendering
  • +90 points on cartoon, anime, and fantasy styles
  • +72 points overall vs MAI-Image-2.0

Real, measurable gains. Especially the text rendering improvement, which has been a weak spot for AI image models for years.

How It Compares

But here’s what the leaderboard actually shows: MAI-Image-2.5 is #3. GPT Image 2.0 holds #1 on that same Arena benchmark. Tools already running on GPT Image 2.0, like the AI photo generator on MagicShot, sit above where Microsoft just landed. GPT Image 2.0 is roughly 72+ points ahead of MAI-Image-2.5’s debut score.

Microsoft competes on cost. GPT Image 2.0 leads on quality. That’s the honest comparison.

MAI-Transcribe-1.5: The Transcription Model

Key Specs

  • Supports 43 languages
  • 5x faster than competing transcription models
  • SOTA accuracy on FLEURS and Artificial Analysis benchmarks
  • Streaming support coming soon (batch only right now)
  • Domain-specific terminology support built in

What Makes It Different

Most transcription tools collapse under specialist vocabulary. Medical notes, legal documents, finance calls, anything with industry-specific terms becomes a mess that needs hours of cleanup. MAI-Transcribe-1.5 handles domain terminology natively.

Microsoft calls it the best transcription model in the world. The benchmarks support that claim. The streaming gap is a limitation worth knowing about if you need real-time use cases.

MAI-Voice-2 and MAI-Voice-2-Flash: The Voice Models

Minimal audio visualization graphic with a glowing white microphone icon centered over a neon waveform on a smooth purple to lavender gradient background.

Key Specs

  • Voice generation across 15+ languages
  • Voice cloning from a short audio sample
  • Low latency for real-time applications
  • Built-in misuse safeguards
  • Flash variant coming soon at a lower cost

What It Can Do

  • Generate natural-sounding speech from text
  • Adapt to a target voice using a short clip
  • Run at low enough latency for real-time voice apps

One honest limitation: Microsoft hasn’t published direct quality benchmarks comparing MAI-Voice-2 to ElevenLabs or Cartesia. The strongest benchmark claims in the MAI launch cover reasoning and transcription. For voice, the selling points are language coverage (15 languages) and latency. Quality claims are less specific.

What Makes MAI Different From Other Model Families

No Distillation

Most efficient models train by having a smaller model learn from a larger one’s outputs. MAI skips this entirely. Clean, commercially licensed training data, trained from scratch. That matters for enterprises with IP concerns around model training.

Custom Silicon

Microsoft’s Maia 200 chip was co-designed specifically for MAI workloads. The claimed 1.4x efficiency gain translates directly to lower inference costs.

Frontier Tuning

This is the enterprise pitch. Reinforcement learning that adapts a MAI model to your specific workflows and data.

Microsoft’s own example: a MAI model tuned for Excel matches GPT-5.4 while running 10x more efficiently. If that holds up in real production workloads, the cost argument is significant for large organizations already inside the Azure ecosystem.

Where You Can Access MAI Models Right Now

ModelWhere It’s Live
MAI-Thinking-1Foundry, OpenRouter, Fireworks, Baseten
MAI-Code-1-FlashGitHub Copilot, VS Code, Foundry
MAI-Image-2.5PowerPoint, OneDrive (rolling out), Foundry
MAI-Image-2.5-FlashFoundry, OpenRouter
MAI-Transcribe-1.5Foundry, OpenRouter
MAI-Voice-2Foundry, OpenRouter
MAI-Voice-2-FlashComing soon

API pricing hasn’t been published. Microsoft’s consistent message is “lower cost than comparable alternatives,” but until per-token rates go live, that’s a promise, not a number.

What This Means for the AI Landscape

Microsoft isn’t just launching models. They’re making a strategic statement.

They’re no longer content to be OpenAI’s biggest distribution partner. The MAI family signals that Microsoft wants to own the full stack: silicon (Maia 200), models (MAI family), and distribution (Foundry, Azure, Microsoft 365). That’s a fundamentally different company than the one that wrote a $13 billion check for GPT access.

The Video Gap

The most notable absence in the MAI lineup is video. Microsoft launched nothing for video generation. Not a single model. While other platforms are already generating cinema-quality clips with models like VEO 3.1 and Seedance 2.0, MAI has a blank space where video should be. If you want to generate AI video right now, you’re not waiting on Microsoft for that.

What It Means for Image Generation

MAI-Image-2.5 at #3 is a credible contender. But the models above it on the Arena leaderboard aren’t standing still. MagicShot gives you access to GPT Image 2.0, Nano Banana 2, and 56+ AI creative tools under one subscription, covering images, video, avatars, headshots, and product photography. The Arena rankings reflect where those models sit today, and it’s above where Microsoft just landed.

Dynamic AI creative workflow grid showcasing eight content types, including luxury product photography, editorial portrait, app icons, cinematic video still, fantasy character art, ecommerce mug shot, floating island landscape, and social media design.

The full MAI announcement is worth reading if you want Microsoft’s framing in full. But the short version: a credible first in-house launch, real benchmark numbers, a clear strategic direction, and a video gap that’s going to be the next thing they need to fill.

For a breakdown of which AI creative tools actually matter for content creators right now, check out the best AI tools for content creators in 2026.

Share

Frequently Asked Questions

MAI is Microsoft’s in-house AI model family launched at Build 2026. Seven models covering reasoning (MAI-Thinking-1), coding (MAI-Code-1-Flash), image generation (MAI-Image-2.5 and Flash), transcription (MAI-Transcribe-1.5), and voice (MAI-Voice-2 and Flash). All built from scratch on Microsoft’s Maia 200 silicon without distillation from other AI labs.

Not a full replacement, but a significant shift. The MAI launch signals Microsoft wants its own AI stack running across its products and Azure infrastructure. They’ve built their own reasoning, image, voice, and transcription models to cover the most common enterprise workloads without routing everything through a third-party API.

MAI-Image-2.5 debuted at #3 on the Arena AI image leaderboard with a score of 1,254. GPT Image 2.0 holds #1, roughly 72+ points higher. MAI-Image-2.5 competes on lower cost and strong text rendering. GPT Image 2.0 leads on overall quality benchmarks.

Yes. MAI-Code-1-Flash is live in GitHub Copilot and VS Code. MAI-Image-2.5 is live in PowerPoint and rolling out to OneDrive. All models are available via API through Microsoft Foundry, OpenRouter, Fireworks, and Baseten. Per-token pricing hasn’t been published yet.

No. None of the 7 MAI models cover video generation. The current lineup handles reasoning, coding, images, transcription, and voice. Video is a clear gap in the MAI family as of Build 2026.

Harish Prajapat (Author)

Hi, I’m Harish! I write about AI content, digital trends, and the latest innovations in technology.

Related news

Get the latest news, tips & tricks, and industry insights on the MagicShot.ai news.