← News Jun 29, 2026

Sakana Fugu: The Multi-Agent AI That Orchestrates Claude, GPT, and Gemini at Once

Sakana Fugu is a multi-agent AI system that coordinates Claude, GPT, and Gemini through a single API, then synthesizes their outputs into one answer.

Harish Prajapat

MagicShot team · 8 min read

Sakana Fugu: The Multi-Agent AI That Orchestrates Claude, GPT, and Gemini at Once

On this page

Who Built Sakana Fugu
How Fugu Actually Works
The Two Research Systems Behind It
What Happens When You Send a Request
Fugu vs Fugu Ultra: Which One
Benchmark Results
The Caveat You Should Know
Pricing
Subscription Plans (include both Fugu and Fugu Ultra)
Pay-as-You-Go (token-based)
Where Fugu Excels
Best Use Cases
Where It Falls Short
How to Access Fugu
Direct Access
Third-Party Platforms
Enterprise Options
What Makes This Actually Different
FAQs

What if you didn't have to pick one model?

That's the core idea behind Sakana Fugu, a multi-agent AI system launched by Sakana AI that coordinates Claude, GPT, and Gemini through a single OpenAI-compatible API endpoint. One request in. The system routes it, delegates it, synthesizes it, and sends one answer back.

It doesn't just run multiple models. It learned how to make them work together.

The results are hard to ignore. Fugu Ultra leads on 10 of 11 major benchmarks, outperforming Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro on coding, reasoning, and scientific tasks. On SWE-Bench Pro, the coding benchmark that most developers trust, Fugu Ultra scores 73.7% compared to Claude Opus 4.8's 69.2% and GPT-5.5's 58.6%.

One big caveat: those are Sakana-reported numbers, not independently verified yet. That matters, and we'll get into it.

Who Built Sakana Fugu

Sakana AI was founded by Llion Jones and David Ha. Jones is one of the eight co-authors of "Attention Is All You Need," the 2017 paper that introduced the transformer architecture everything in AI now runs on. Ha was previously at Google Brain.

That pedigree explains the research-first approach. Fugu isn't built on prompt tricks or fixed routing rules. It's built on two papers published at ICLR 2026.

How Fugu Actually Works

The Two Research Systems Behind It

TRINITY assigns three distinct roles across the model pool:

Thinker - breaks down the problem and plans an approach
Worker - executes the actual task
Verifier - checks the output for errors and inconsistencies

The coordinator is lightweight and "evolved" rather than hand-coded, meaning the role assignments learned to be efficient rather than following a ruleset someone wrote.

Conductor takes a different approach. It uses reinforcement learning to discover coordination strategies between agents. Not strategies humans designed. Strategies the system found by experimenting.

Sakana describes it as: "instead of using domain knowledge to prescribe team organization, roles, or workflows, Fugu learns to dynamically assemble agents from a pool and coordinate them through non-obvious but highly efficient collaboration patterns."

That last phrase matters. The coordination patterns aren't intuitive. They're just effective.

What Happens When You Send a Request

Your prompt hits the single API endpoint
Fugu's orchestrator decides whether to handle it directly or delegate to specialists
The relevant models process their assigned portions independently
A synthesizer evaluates all outputs, detects contradictions, weights reasoning chains
One consolidated answer comes back

The whole process is invisible from the outside. Your application sends a standard OpenAI-format request. It gets a response. What happened in between is Sakana's proprietary system.

Fugu vs Fugu Ultra: Which One

Two tiers. Different priorities.

	Fugu	Fugu Ultra
Priority	Speed and low latency	Answer quality
Best for	Coding, review, interactive work	Complex reasoning, research, hard problems
Agent coordination	Standard	Coordinates more expert agents
Response time	Faster	Slower
Pay-as-you-go	Underlying model rate	$5/$30 per 1M input/output tokens

For most daily coding and review tasks, standard Fugu is the right call. For anything where getting the best possible answer matters more than getting it fast, Fugu Ultra is the one to use.

Benchmark Results

[IMAGE: Clean data visualization comparing Fugu Ultra against competing AI models on coding and reasoning benchmarks, bar chart style, dark background, teal and purple bars, showing clear performance gaps]

Fugu Ultra leads on 10 of 11 benchmarks tested. Here's what the numbers look like on the ones that matter most:

Benchmark	Fugu Ultra	Claude Opus 4.8	GPT-5.5	Gemini 3.1 Pro
SWE-Bench Pro	73.7%	69.2%	58.6%	54.2%
LiveCodeBench Pro	90.8%	84.8%	88.4%	82.9%
TerminalBench 2.1	82.1%	74.6%	78.2%	70.3%
Humanity's Last Exam	50.0%	49.8%	41.4%	44.4%
MRCRv2	93.6%	—	94.8%	—

The only loss is MRCRv2, where GPT-5.5 edges ahead at 94.8% versus 93.6%. Every other benchmark goes to Fugu Ultra.

The Caveat You Should Know

All of these numbers are self-reported by Sakana AI. Independent verification hasn't happened yet. That's not unusual for a new product launch, but it does mean the benchmark claims deserve healthy skepticism until third parties start testing. The code review demonstration, where beta testers saw Fugu Ultra surface 20+ issues versus roughly 3 from competitors, is more immediately usable as signal than the benchmark table.

Pricing

Subscription Plans (include both Fugu and Fugu Ultra)

Standard: $20/month - lightweight daily use
Pro: $100/month - 10x usage for focused work sessions
Max: $200/month - 30x usage for heavy workloads
Free second month promo if you subscribe before July 2026

Pay-as-You-Go (token-based)

Fugu: Standard rate of the underlying model being used
Fugu Ultra (standard context): $5 input / $30 output per 1M tokens
Fugu Ultra (context over 272K tokens): $10 input / $45 output per 1M tokens
Cached input: $0.50 to $1.00 per 1M tokens

The subscription tiers are straightforward. The pay-as-you-go pricing for Fugu Ultra is on the expensive end, particularly at the long-context rate. For high-volume production workloads, the cost math needs careful attention before committing.

One pricing detail worth noting: when multiple agents run in parallel, you pay only the rate of the highest-tier model involved. Not a stacked fee for each agent.

Where Fugu Excels

Best Use Cases

Code review at depth - finding edge cases, security issues, and logic bugs across large codebases
Complex multi-step reasoning - problems that require planning before executing
Research and paper reproduction - scientific analysis with multiple validation steps
Security assessments - penetration testing scenarios where thoroughness beats speed
Long-context analysis - patent analysis, document review, legal summarization
High-stakes accuracy - medical, financial, and legal domains where errors are costly

Where It Falls Short

Be honest about this before you adopt it:

Real-time applications - orchestration overhead adds latency. If you need sub-second responses, Fugu adds friction
Simple, well-defined tasks - routing a basic question through three models is overkill and costs more than it should
High-volume, cost-sensitive operations - the per-token rate for Fugu Ultra at scale is not cheap
Compliance-sensitive routing - the orchestration logic is proprietary and non-auditable. If your organization needs to know exactly which model processed which data, Fugu can't fully satisfy that
EU/EEA users - not available yet, GDPR compliance work is still in progress

How to Access Fugu

Direct Access

Via sakana.ai/fugu/ with a standard OpenAI-compatible API client
Available globally except EU/EEA

Third-Party Platforms

OpenRouter - multi-model routing platform
Vercel AI Gateway - for Next.js and Vercel deployments
Creao - AI workflow platform

Enterprise Options

Opt out of specific providers for privacy and compliance requirements
Custom configurations for regulated industries
Token usage and costs reported per request for monitoring

What Makes This Actually Different

Most "multi-model" approaches in AI right now are prompt-chaining tools. You write a chain, pick which model runs which step, and manage the outputs yourself. LangChain, AutoGen, and similar frameworks work this way. They're powerful but they require you to design the workflow.

Fugu removes that design step. The orchestration layer learned when to delegate, how agents should communicate, and how to combine outputs. You don't configure a pipeline. You send a request.

That's the practical difference. And it's why the code review result is compelling. Nobody designed a "find bugs with multiple models" workflow. The system figured out how to do it and did it better than any single model alone.

For teams already building on AI tools and wondering whether adding model orchestration is worth the complexity, Fugu makes the answer simpler. The complexity is already handled.

If you're evaluating where AI fits into your creative and content workflows more broadly, the AI tools guide for content creators covers how platforms like MagicShot are already running multiple frontier models including GPT Image 2.0, Nano Banana 2, VEO 3.1, and Seedance 2.0 under one subscription for creative work.

Sakana Fugu is worth watching closely over the next few months as independent benchmark verification catches up to the self-reported numbers. If the performance holds under third-party testing, it's a meaningful shift in what a single API endpoint can deliver.

AI Model Coding Vibe Coding

About the author

Harish Prajapat

AI Imaging Specialist & Lead Content Strategist

Ahmedabad, India Writing since 2021

Harish has spent the last six years testing AI image and video tools the hard way — shipping thousands of real generations for brands, marketplaces and his own side projects. At MagicShot he turns dense model releases into step-by-step workflows anyone can follow, and personally re-tests every prompt and setting before it lands in a guide. When he is not benchmarking the latest diffusion model, he is answering "which tool should I actually use?" for creators in the community.

AI Image Generation Prompt Engineering Photo Restoration Generative Workflows

View all articles by Harish

Sakana Fugu: The Multi-Agent AI That Orchestrates Claude, GPT, and Gemini at Once

Who Built Sakana Fugu

How Fugu Actually Works

The Two Research Systems Behind It

What Happens When You Send a Request

Fugu vs Fugu Ultra: Which One

Benchmark Results

The Caveat You Should Know

Pricing

Subscription Plans (include both Fugu and Fugu Ultra)

Pay-as-You-Go (token-based)

Where Fugu Excels

Best Use Cases

Where It Falls Short

How to Access Fugu

Direct Access

Third-Party Platforms

Enterprise Options

What Makes This Actually Different

Harish Prajapat

Frequently asked questions

More news

Happy Horse 1.1 Lands on MagicShot: Alibaba's Top-Ranked Text-to-Video Model

Seedream 5.0 Pro Lands: ByteDance's New Flagship Image Model Explained

Nano Banana 2 Lite Lands on MagicShot: Google's Fastest Image Model

Ready to create something magical?

Seedream 5.0 Pro

Sakana Fugu: The Multi-Agent AI That Orchestrates Claude, GPT, and Gemini at Once

Who Built Sakana Fugu

How Fugu Actually Works

The Two Research Systems Behind It

What Happens When You Send a Request

Fugu vs Fugu Ultra: Which One

Benchmark Results

The Caveat You Should Know

Pricing

Subscription Plans (include both Fugu and Fugu Ultra)

Pay-as-You-Go (token-based)

Where Fugu Excels

Best Use Cases

Where It Falls Short

How to Access Fugu

Direct Access

Third-Party Platforms

Enterprise Options

What Makes This Actually Different

Harish Prajapat

Frequently asked questions

What is Sakana Fugu?

How does Sakana Fugu compare to using a single model like Claude or GPT-5.5?

What does Sakana Fugu cost?

Is Sakana Fugu available in Europe?

What is the difference between TRINITY and Conductor in Sakana Fugu?

More news

Happy Horse 1.1 Lands on MagicShot: Alibaba's Top-Ranked Text-to-Video Model

Seedream 5.0 Pro Lands: ByteDance's New Flagship Image Model Explained

Nano Banana 2 Lite Lands on MagicShot: Google's Fastest Image Model

Ready to create something magical?