MiniMax M3 Lands: Open-Weight Model Hits GPT-5.5 Coding Scores
- AI News
- 4 min read
- Published: June 1, 2026
- Harish Prajapat
MiniMax just shipped M3. And it’s the first open-weight model that actually walks the walk on frontier coding, million-token context, and native multimodality all at once.
The Tencent and Alibaba backed lab dropped M3 on June 1, 2026, ending a long stretch where every open-weight model had to pick two of those three capabilities. M2 and M2.5 got MiniMax taken seriously. M3 is the one that puts the company in the same conversation as OpenAI, Anthropic, and Google.
Why this matters if you build with AI. You can now self-host a model that benchmarks above GPT-5.5 on real software engineering tasks, swallows entire codebases or hours of video in one prompt, and runs autonomous agents for days without phoning home to a closed API.
The benchmark numbers that actually matter
On SWE-Bench Pro, the industry’s nastiest real-world coding benchmark, M3 hits 59.0%. That beats GPT-5.5 at 58.6% and edges past Gemini 3.1 Pro. Claude Opus 4.7 still leads at 64.3%, so Anthropic keeps the coding crown. But the gap to an open model has never been this thin.
The agentic numbers are where it gets spicy. M3 scores 83.5 on BrowseComp, which is autonomous web browsing, and that’s clear of Opus 4.7’s 79.3. Terminal-Bench 2.1 comes in at 66.0%. MCP Atlas, which tests multi-tool agent coordination, lands at 74.2%.
On an internal long-horizon agentic benchmark M3 scores 0.37. Opus 4.7 sits at 0.42 and GPT-5.5 at 0.39. So closed models still win the marathon. But M3 is ahead of every other open model tested, by a margin that isn’t close.
MSA is doing the heavy lifting
The architecture story here is MiniMax Sparse Attention, or MSA. It’s a new attention mechanism built entirely in-house. The payoff is brutal efficiency at long context. Prefill runs 9.7x faster than M2. Decoding hits 15.6x faster at 1M tokens.
Translation. A million-token prompt that used to be a budget killer is now actually usable in production.

The 24 hour autonomous coding run
MiniMax stress tested M3 on a GPU kernel optimization task. The model ran for about 24 hours straight. It made 147 benchmark submissions, executed 1,959 tool calls, and worked through six landmark rounds of optimization. Starting point was a non-functional baseline. End point was production-grade code. Zero human in the loop.
That’s the kind of run that makes you rethink what an agent actually is. Most current agents tap out after an hour because of context limits or compounding errors. M3 paired with the updated MiniMax Code, which now uses a Producer plus Verifier adversarial loop, just keeps going.
Multimodal as a first-class citizen
M3 takes images, video, and desktop computer use as native inputs. Not bolted on after the fact. The training data pipeline got rebuilt from scratch to handle interleaved multimodal data, and the total training set scaled to roughly 100 trillion tokens.
For creative workflows this is the interesting bit. A model that can watch a video, reason across a million tokens, and operate a desktop is the missing piece for autonomous content pipelines. Pair that with an AI Video Generator and you can start sketching workflows where one model plans, generates, reviews, and iterates without a human babysitter.
Pricing and the catch
The M3 API ships on OpenRouter at $0.30 per million input tokens and $1.20 per million output tokens. Anything over 512K input tokens jumps to a higher long-context tier. That’s competitive with Chinese open-weight rivals like the recently released Kimi K2.5 and undercuts every closed frontier model on a per-token basis.
The catch. M3 still loses to Claude Opus 4.7 on the hardest coding tasks, and the long-horizon agentic gap with GPT-5.5 is real if narrow. This isn’t a clean win across the board. It’s a credible open-weight option that’s finally good enough to be a default choice instead of a fallback.
Expect the next round of open-weight releases to chase MSA-style efficiency hard, because long context just stopped being a premium feature.
Frequently Asked Questions
M3 is MiniMax’s first open-weight frontier model released on June 1, 2026. It combines frontier-level coding intelligence, a 1-million-token context window, and native multimodal input for images, video, and desktop computer use inside a single model.
On SWE-Bench Pro, M3 scores 59.0% which beats GPT-5.5 at 58.6% and Gemini 3.1 Pro. It trails Claude Opus 4.7 at 64.3%. On BrowseComp, M3 scores 83.5 which beats Opus 4.7’s 79.3.
On OpenRouter the M3 API is priced at $0.30 per million input tokens and $1.20 per million output tokens. Calls that exceed 512K input tokens move to a higher long-context billing tier.
