Microsoft Just Launched 7 In-House AI Models at Build 2026. Here’s What Each One Does

AI News
8 min read
Published: June 4, 2026
Harish Prajapat

Seven models. One announcement. Zero OpenAI.

That’s what happened at Microsoft Build 2026 on June 2nd.

Microsoft didn’t just update a few tools. They launched an entirely in-house model family called MAI, covering reasoning, coding, image generation, transcription, and voice. All built from scratch. And they made a point of saying it directly: “We train from scratch. We don’t distill from other labs.”

That last part matters more than it sounds.

Microsoft has been OpenAI’s biggest financial backer since 2019, a $13 billion relationship that put GPT models inside Bing, Azure, Word, and basically everything Microsoft ships. But at Build 2026, they showed up with their own stack. Seven models deep. Built on their own Maia 200 silicon, co-designed in-house, with a claimed 1.4x efficiency advantage.

So what actually launched? Let’s go through all seven.

The 7 Microsoft MAI Models at a Glance

Before the deep dive, here’s the full list:

MAI-Thinking-1 – Reasoning and complex problem-solving
MAI-Code-1-Flash – Coding and developer tools
MAI-Image-2.5 – Text-to-image and image editing
MAI-Image-2.5-Flash – Lightweight image generation variant
MAI-Transcribe-1.5 – Speech-to-text across 43 languages
MAI-Voice-2 – Text-to-speech with voice cloning
MAI-Voice-2-Flash – Efficient voice generation (coming soon)

Now the breakdown.

MAI-Thinking-1: The Reasoning Model

This is the flagship. The one Microsoft leads with.

Key Specs

35 billion active parameters
256K context window (roughly 200,000 words in one session)
Mid-weight pricing tier
Available through: Foundry, OpenRouter, Fireworks, Baseten

What It Can Do

Handles complex multi-step reasoning problems
Strong long-context performance across coding and analysis tasks
Microsoft claims blind human evaluators preferred it over Claude Sonnet 4.6
Matches Claude Opus 4.6 on SWE-Bench Pro coding benchmarks

Those are aggressive claims for a first-generation reasoning model. But the benchmark numbers appear to back them up.

MAI-Code-1-Flash: The Developer Model

Small, fast, already deployed.

Key Specs

5 billion active parameters
Lightweight agentic architecture
Positioned as cheaper than Anthropic’s Haiku
Live in GitHub Copilot and VS Code right now

What It Can Do

Agentic coding tasks inside the Microsoft developer stack
Inline code suggestions and completions
Optimized for speed, not raw reasoning power

If you code in VS Code or use GitHub Copilot, you’ve probably already used this model without knowing it.

MAI-Image-2.5 and MAI-Image-2.5-Flash: The Image Models

Split-screen AI image generation comparison showing a photorealistic luxury perfume bottle on marble beside a highly detailed illustrated fantasy warrior character in ornate blue and gold armor.

This is the most relevant model for anyone working with AI creative tools.

Key Specs

Handles text-to-image and image editing in one model
Arena AI leaderboard score: 1,254 (#3 overall)
#2 specifically for image editing tasks
Flash variant available for high-volume, lower-cost workloads
Live in PowerPoint, rolling out to OneDrive

Benchmark Improvements Over Previous Version

+107 points on text rendering
+90 points on cartoon, anime, and fantasy styles
+72 points overall vs MAI-Image-2.0

Real, measurable gains. Especially the text rendering improvement, which has been a weak spot for AI image models for years.

How It Compares

But here’s what the leaderboard actually shows: MAI-Image-2.5 is #3. GPT Image 2.0 holds #1 on that same Arena benchmark. Tools already running on GPT Image 2.0, like the AI photo generator on MagicShot, sit above where Microsoft just landed. GPT Image 2.0 is roughly 72+ points ahead of MAI-Image-2.5’s debut score.

Microsoft competes on cost. GPT Image 2.0 leads on quality. That’s the honest comparison.

MAI-Transcribe-1.5: The Transcription Model

Key Specs

Supports 43 languages
5x faster than competing transcription models
SOTA accuracy on FLEURS and Artificial Analysis benchmarks
Streaming support coming soon (batch only right now)
Domain-specific terminology support built in

What Makes It Different

Most transcription tools collapse under specialist vocabulary. Medical notes, legal documents, finance calls, anything with industry-specific terms becomes a mess that needs hours of cleanup. MAI-Transcribe-1.5 handles domain terminology natively.

Microsoft calls it the best transcription model in the world. The benchmarks support that claim. The streaming gap is a limitation worth knowing about if you need real-time use cases.

MAI-Voice-2 and MAI-Voice-2-Flash: The Voice Models

Minimal audio visualization graphic with a glowing white microphone icon centered over a neon waveform on a smooth purple to lavender gradient background.

Key Specs

Voice generation across 15+ languages
Voice cloning from a short audio sample
Low latency for real-time applications
Built-in misuse safeguards
Flash variant coming soon at a lower cost

What It Can Do

Generate natural-sounding speech from text
Adapt to a target voice using a short clip
Run at low enough latency for real-time voice apps

One honest limitation: Microsoft hasn’t published direct quality benchmarks comparing MAI-Voice-2 to ElevenLabs or Cartesia. The strongest benchmark claims in the MAI launch cover reasoning and transcription. For voice, the selling points are language coverage (15 languages) and latency. Quality claims are less specific.

What Makes MAI Different From Other Model Families

No Distillation

Most efficient models train by having a smaller model learn from a larger one’s outputs. MAI skips this entirely. Clean, commercially licensed training data, trained from scratch. That matters for enterprises with IP concerns around model training.

Custom Silicon

Microsoft’s Maia 200 chip was co-designed specifically for MAI workloads. The claimed 1.4x efficiency gain translates directly to lower inference costs.

Frontier Tuning

This is the enterprise pitch. Reinforcement learning that adapts a MAI model to your specific workflows and data.

Microsoft’s own example: a MAI model tuned for Excel matches GPT-5.4 while running 10x more efficiently. If that holds up in real production workloads, the cost argument is significant for large organizations already inside the Azure ecosystem.

Where You Can Access MAI Models Right Now

Model	Where It’s Live
MAI-Thinking-1	Foundry, OpenRouter, Fireworks, Baseten
MAI-Code-1-Flash	GitHub Copilot, VS Code, Foundry
MAI-Image-2.5	PowerPoint, OneDrive (rolling out), Foundry
MAI-Image-2.5-Flash	Foundry, OpenRouter
MAI-Transcribe-1.5	Foundry, OpenRouter
MAI-Voice-2	Foundry, OpenRouter
MAI-Voice-2-Flash	Coming soon

API pricing hasn’t been published. Microsoft’s consistent message is “lower cost than comparable alternatives,” but until per-token rates go live, that’s a promise, not a number.

What This Means for the AI Landscape

Microsoft isn’t just launching models. They’re making a strategic statement.

They’re no longer content to be OpenAI’s biggest distribution partner. The MAI family signals that Microsoft wants to own the full stack: silicon (Maia 200), models (MAI family), and distribution (Foundry, Azure, Microsoft 365). That’s a fundamentally different company than the one that wrote a $13 billion check for GPT access.

The Video Gap

The most notable absence in the MAI lineup is video. Microsoft launched nothing for video generation. Not a single model. While other platforms are already generating cinema-quality clips with models like VEO 3.1 and Seedance 2.0, MAI has a blank space where video should be. If you want to generate AI video right now, you’re not waiting on Microsoft for that.

What It Means for Image Generation

MAI-Image-2.5 at #3 is a credible contender. But the models above it on the Arena leaderboard aren’t standing still. MagicShot gives you access to GPT Image 2.0, Nano Banana 2, and 56+ AI creative tools under one subscription, covering images, video, avatars, headshots, and product photography. The Arena rankings reflect where those models sit today, and it’s above where Microsoft just landed.

Dynamic AI creative workflow grid showcasing eight content types, including luxury product photography, editorial portrait, app icons, cinematic video still, fantasy character art, ecommerce mug shot, floating island landscape, and social media design.

The full MAI announcement is worth reading if you want Microsoft’s framing in full. But the short version: a credible first in-house launch, real benchmark numbers, a clear strategic direction, and a video gap that’s going to be the next thing they need to fill.

For a breakdown of which AI creative tools actually matter for content creators right now, check out the best AI tools for content creators in 2026.

Frequently Asked Questions

MAI is Microsoft’s in-house AI model family launched at Build 2026. Seven models covering reasoning (MAI-Thinking-1), coding (MAI-Code-1-Flash), image generation (MAI-Image-2.5 and Flash), transcription (MAI-Transcribe-1.5), and voice (MAI-Voice-2 and Flash). All built from scratch on Microsoft’s Maia 200 silicon without distillation from other AI labs.

Not a full replacement, but a significant shift. The MAI launch signals Microsoft wants its own AI stack running across its products and Azure infrastructure. They’ve built their own reasoning, image, voice, and transcription models to cover the most common enterprise workloads without routing everything through a third-party API.

MAI-Image-2.5 debuted at #3 on the Arena AI image leaderboard with a score of 1,254. GPT Image 2.0 holds #1, roughly 72+ points higher. MAI-Image-2.5 competes on lower cost and strong text rendering. GPT Image 2.0 leads on overall quality benchmarks.

Yes. MAI-Code-1-Flash is live in GitHub Copilot and VS Code. MAI-Image-2.5 is live in PowerPoint and rolling out to OneDrive. All models are available via API through Microsoft Foundry, OpenRouter, Fireworks, and Baseten. Per-token pricing hasn’t been published yet.

No. None of the 7 MAI models cover video generation. The current lineup handles reasoning, coding, images, transcription, and voice. Video is a clear gap in the MAI family as of Build 2026.

Harish Prajapat (Author)

Hi, I’m Harish! I write about AI content, digital trends, and the latest innovations in technology.

Microsoft Just Launched 7 In-House AI Models at Build 2026. Here’s What Each One Does

The 7 Microsoft MAI Models at a Glance

MAI-Thinking-1: The Reasoning Model

Key Specs

What It Can Do

MAI-Code-1-Flash: The Developer Model

Key Specs

What It Can Do

MAI-Image-2.5 and MAI-Image-2.5-Flash: The Image Models

Key Specs

Benchmark Improvements Over Previous Version

How It Compares

MAI-Transcribe-1.5: The Transcription Model

Key Specs

What Makes It Different

MAI-Voice-2 and MAI-Voice-2-Flash: The Voice Models

Key Specs

What It Can Do

What Makes MAI Different From Other Model Families

No Distillation

Custom Silicon

Frontier Tuning

Where You Can Access MAI Models Right Now

What This Means for the AI Landscape

The Video Gap

What It Means for Image Generation

Share

Frequently Asked Questions

Related news

Get the latest news, tips & tricks, and industry insights on the MagicShot.ai news.

MiniMax M3 Lands: Open-Weight Model Hits GPT-5.5 Coding Scores

Claude Opus 4.8 Release Beats GPT-5.5 on Most Benchmarks

Google I/O 2026 Updates: Gemini Goes Full Agent Mode

Microsoft Just Launched 7 In-House AI Models at Build 2026. Here’s What Each One Does

The 7 Microsoft MAI Models at a Glance

MAI-Thinking-1: The Reasoning Model

Key Specs

What It Can Do

MAI-Code-1-Flash: The Developer Model

Key Specs

What It Can Do

MAI-Image-2.5 and MAI-Image-2.5-Flash: The Image Models

Key Specs

Benchmark Improvements Over Previous Version

How It Compares

MAI-Transcribe-1.5: The Transcription Model

Key Specs

What Makes It Different

MAI-Voice-2 and MAI-Voice-2-Flash: The Voice Models

Key Specs

What It Can Do

What Makes MAI Different From Other Model Families

No Distillation

Custom Silicon

Frontier Tuning

Where You Can Access MAI Models Right Now

What This Means for the AI Landscape

The Video Gap

What It Means for Image Generation

Share

Frequently Asked Questions

What are Microsoft's MAI models?

Is Microsoft replacing OpenAI with MAI?

How does MAI-Image-2.5 compare to GPT Image 2.0?

Can I use Microsoft MAI models today?

Does MAI include a video generation model?

Related news

Get the latest news, tips & tricks, and industry insights on the MagicShot.ai news.

MiniMax M3 Lands: Open-Weight Model Hits GPT-5.5 Coding Scores

Claude Opus 4.8 Release Beats GPT-5.5 on Most Benchmarks

Google I/O 2026 Updates: Gemini Goes Full Agent Mode