Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs
Today, Sakana AI launched Sakana Fugu. It is a multi-agent orchestration system that behaves like one mannequin. You ship a request to a single endpoint. Fugu decides the way to deal with it internally. It solves a job straight when that’s sufficient. It additionally assembles and coordinates a crew of knowledgeable fashions when wanted. The complexity of a multi-agent system by no means reaches your code.
TL;DR
- Fugu delivers a multi-agent system behind one OpenAI-compatible API.
- Fugu Ultra leads most revealed coding and reasoning benchmarks.
- The orchestrator beats the person fashions it coordinates.
- Opt-out and supplier routing goal compliance and single-vendor danger.
- Routing is proprietary, so per-query mannequin choice stays hidden.
What is Sakana Fugu
Fugu is itself a language mannequin. It is educated to name different LLMs in an agent pool. That pool consists of situations of itself, referred to as recursively. Fugu manages mannequin choice, delegation, verification, and synthesis internally.
Instead of hard-coded roles or workflows, Fugu learns the way to coordinate. It decides when to delegate and the way brokers ought to talk. It then combines their work into one reply. From the surface, you name a single mannequin. Inside, a coordinated system of consultants does the work.
Sakana AI frames this as a hedge in opposition to single-vendor dependency. If one supplier restricts entry, Fugu routes across the disruption. The analysis crew cites current export controls on Anthropic’s Fable and Mythos fashions as motivation. Over time, newer fashions will be folded into the pool.
Fugu and Fugu Ultra: Two Models, One API
Fugu ships in two variants, each behind one OpenAI-compatible API:
- Fugu balances sturdy efficiency with low latency. It is a default for on a regular basis coding, code assessment, and chatbots. It additionally matches instruments like Codex. You can choose particular brokers out of its pool. That helps groups meet information, privateness, and compliance necessities.
- Fugu Ultra is tuned for max reply high quality on onerous, multi-step issues. It coordinates a deeper pool of knowledgeable brokers. Its pool is mounted, so opt-out isn’t accessible. The present mannequin ID is
fugu-ultra-20260615.
The Research Behind the Orchestrator
Fugu builds on two ICLR 2026 papers Trinity and the Conductor on realized orchestration.
TRINITY makes use of a light-weight advanced coordinator throughout a number of turns. It assigns Thinker, Worker, or Verifier roles to delegate work adaptively. Conductor is educated with reinforcement studying. It discovers natural-language coordination methods and targeted prompts for various LLM swimming pools.
Together, they present techniques can be taught to assemble and route brokers per job. That replaces hand-designed workflows.
Interactive Explainer
Benchmark
Sakana AI compares Fugu in opposition to the muse fashions it orchestrates. Baselines use provider-reported scores. SWE Bench Pro makes use of the mini-swe-agent as scaffolding.
| Benchmark | Fugu | Fugu Ultra | Opus 4.8 | Gemini 3.1 Pro | GPT 5.5 |
|---|---|---|---|---|---|
| SWE Bench Pro* | 59.0 | 73.7 | 69.2 | 54.2 | 58.6 |
| TerminalBench 2.1 | 80.2 | 82.1 | 74.6 | 70.3 | 78.2 |
| ResideCodeBench | 92.9 | 93.2 | 87.8 | 88.5 | 85.3 |
| ResideCodeBench Pro | 87.8 | 90.8 | 84.8 | 82.9 | 88.4 |
| Humanity’s Last Exam | 47.2 | 50.0 | 49.8 | 44.4 | 41.4 |
| CharXiv Reasoning | 85.1 | 86.6 | 84.2 | 83.3 | 84.1 |
| GPQA-D | 95.5 | 95.5 | 92.0 | 94.3 | 93.6 |
| SciCode | 60.1 | 58.7 | 53.5 | 58.9 | 56.1 |
| τ³ Banking | 21.7 | 20.6 | 20.6 | 8.4 | 20.6 |
| Long Context Reasoning | 74.7 | 73.3 | 67.7 | 72.7 | 74.3 |
| MRCRv2 | 86.6 | 93.6 | 87.9 | 84.9 | 94.8 |
The orchestrator posts the highest rating on 10 of 11 rows. Fugu Ultra tops the 4 coding benchmarks, CharXiv Reasoning, and Humanity’s Last Exam. It ties common Fugu on GPQA-D. Regular Fugu leads SciCode, τ³ Banking, and Long Context Reasoning. GPT 5.5 wins MRCRv2, the one baseline win right here.
Its Fugu fashions stand shoulder-to-shoulder with Anthropic’s Fable 5 and Mythos Preview. Those two will not be in Fugu’s pool, since they don’t seem to be publicly accessible.
Use Cases
Sakana AI ran a beta with near 500 early customers. The revealed examples favor lengthy, multi-step duties.
- AutoResearch: An agent improved a small GPT’s coaching recipe autonomously. It ran 123 experiments over roughly 14 hours on one H100 GPU. Fugu Ultra reached the perfect imply validation BPB of 0.9774, with a finest single run of 0.9748.
- Rubik’s Cube solver: Each mannequin wrote a pure-Python solver, no libraries allowed. Fugu Ultra solved all 300 held-out cubes, averaging 19.72 strikes. One baseline matched it intently at 19.76 strikes. Two others crashed and solved none.
- Classical Japanese kana studying order: On a 1610 letter, Fugu Ultra scored NED 0.80. The nearest baseline reached solely 0.24.
- Blindfold chess: Fugu performed 4 video games from reminiscence, with no board proven. It beat three frontier fashions and a 2100-Elo Stockfish engine.
- Online buying and selling: On one 50-week window, Fugu Ultra returned +19.43% on common throughout 5 runs. The different frontier fashions stayed beneath +15%. Sakana AI notes previous efficiency doesn’t assure future outcomes.
A Minimal API Example
Fugu makes use of an OpenAI-compatible API, so no SDK migration is required. Point an current consumer at your console-provided endpoint.
from openai import OpenAI
# Endpoint and key come out of your Sakana console (console.sakana.ai).
consumer = OpenAI(
base_url="https://<your-fugu-endpoint>/v1", # from console.sakana.ai
api_key="YOUR_SAKANA_API_KEY",
)
resp = consumer.chat.completions.create(
mannequin="fugu-ultra-20260615", # or "fugu"
messages=[
{"role": "user",
"content": "Reproduce the method in this paper and report the gap."},
],
)
print(resp.decisions[0].message.content material)
Token utilization and price are reported per request. So you may monitor spend in actual time.
Community Reactions
Sakana Fugu — Early Community Sentiment
A handbook assessment of public response on X and Hacker News, with hyperlinks to each supply. Captured June 22, 2026.
12 posts reviewed
Skeptical
Critical
Sources: X · Hacker News · VentureBeat
