Gemini 3.5 Flash Launches at Google I/O 2026 as Default Model

Google just made its fastest model its smartest one too. At Google I/O 2026 on May 19 at Shoreline Amphitheatre in Mountain View, the company launched Gemini 3.5 Flash, the first model in the new Gemini 3.5 family and now the default engine behind the Gemini app and AI Mode in Google Search globally.

The headline number? Output runs 4x faster than other frontier models measured in tokens per second. And it still beats the older Gemini 3.1 Pro on the benchmarks that actually matter for agents.

What Google shipped

Google describes Gemini 3.5 Flash as “frontier intelligence with action.” Translation: it’s built to plan, call tools, spin up subagents, and grind through multi-step workflows without falling apart halfway through. The model was authored by CTO Koray Kavukcuoglu, Chief Scientist Jeff Dean, VP Oriol Vinyals, and VP Noam Shazeer. Basically the core Gemini team put their names on it.

On benchmarks, 3.5 Flash hits 76.2% on Terminal-Bench 2.1, 1656 Elo on GDPval-AA, 83.6% on MCP Atlas, and 84.2% on CharXiv Reasoning for multimodal understanding. All four numbers top Gemini 3.1 Pro, which is the wild part. The cheaper, faster model is now beating the older flagship.

Artificial Analysis put it alone in the top-right quadrant of its Intelligence vs. Speed index. The only frontier model right now that combines top-tier intelligence with exceptional speed, according to that chart.

Pricing that undercuts everyone

API pricing lands at $0.50 per million input tokens and $3.00 per million output tokens for the thinking variant. Google says that’s less than half the cost of comparable frontier alternatives. If you’ve been paying GPT-5 or Claude Sonnet 4.6 rates for agent workloads, that math is going to hurt.

Inputs cover text, image, video, audio, and PDF. Context window sits at 1 million input tokens with a 64k output cap. You can pull it through Google Cloud Vertex AI, Google AI Studio, the Gemini API, Gemini CLI, Gemini Enterprise, Google Antigravity, and Android Studio. Basically anywhere Google ships a developer surface.

Where it’s already running

This isn’t a research drop. Gemini 3.5 Flash is the default model right now in the Gemini app worldwide and in Search AI Mode. It also powers Gemini Spark, Google’s new personal AI agent that runs 24/7 on dedicated Google Cloud VMs. Spark started its Beta rollout to Google AI Ultra subscribers in the US the same week of the launch.

Enterprise customers are already plugged in. Shopify is running 3.5 Flash subagents in parallel to crunch merchant data for global growth forecasting. Macquarie Bank uses it to speed up customer onboarding across 100-plus page documents. Salesforce baked it into Agentforce for multi-step enterprise automation. And Xero is letting it autonomously handle multi-week admin workflows including 1099 tax form prep.

The early numbers from partners are decent. Box reports a 15% accuracy gain on handwriting, long contracts, and messy financial data versus Gemini 2.5 Flash. Warp says command-line error resolution improved 8% in its Suggested Code Diffs feature. Harvey clocked a 7% lift on its BigLaw Bench for high-volume legal contract tasks.

The safety angle and what’s next

Google says 3.5 Flash was built under its Frontier Safety Framework with strengthened cyber and CBRN safeguards. There are also new interpretability tools that check the model’s inner reasoning before it answers. Whether that actually catches subtle agentic failure modes in production is the open question. Interpretability for long-horizon agents is hard, and nobody’s fully solved it yet.

Gemini 3.5 Pro, the bigger sibling, is already in internal use at Google. Public rollout is planned for June 2026.

For anyone building creative pipelines, the bigger picture matters more than the benchmarks. Faster, cheaper agentic models mean smarter content workflows, better tool calling across multimodal stacks, and AI that can actually coordinate between image, video, and text generation steps. The AI Video Generator on MagicShot already runs on models like VEO 3.1 and Seedance 2.0 for cinematic output. If you want the deeper backstory on how Google got here, check our earlier coverage of the Gemini 3 breakthrough.

The race for the default agent layer just got a new front-runner, and the next 90 days will tell us who responds first.

Share

Frequently Asked Questions

Google announced and released Gemini 3.5 Flash on May 19, 2026 at Google I/O 2026, held at Shoreline Amphitheatre in Mountain View, California. It is available to billions of users immediately via the Gemini app and AI Mode in Google Search.

Gemini 3.5 Flash costs $0.50 per million input tokens and $3.00 per million output tokens for the thinking variant. Google says that is less than half the cost of comparable frontier model alternatives.

Gemini 3.5 Flash outperforms the older Gemini 3.1 Pro on key benchmarks, scoring 76.2% on Terminal-Bench 2.1, 1656 Elo on GDPval-AA, 83.6% on MCP Atlas, and 84.2% on CharXiv Reasoning. It also delivers output 4x faster than other frontier models.

Harish Prajapat (Author)

Hi, I’m Harish! I write about AI content, digital trends, and the latest innovations in technology.

Related news

Get the latest news, tips & tricks, and industry insights on the MagicShot.ai news.