Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

Today, Sakana AI launched Sakana Fugu. It is a multi-agent orchestration system that behaves like one mannequin. You ship a request to a single endpoint. Fugu decides the way to deal with it internally. It solves a job straight when that’s sufficient. It additionally assembles and coordinates a crew of knowledgeable fashions when wanted. The complexity of a multi-agent system by no means reaches your code.

TL;DR

Fugu delivers a multi-agent system behind one OpenAI-compatible API.
Fugu Ultra leads most revealed coding and reasoning benchmarks.
The orchestrator beats the person fashions it coordinates.
Opt-out and supplier routing goal compliance and single-vendor danger.
Routing is proprietary, so per-query mannequin choice stays hidden.

What is Sakana Fugu

Fugu is itself a language mannequin. It is educated to name different LLMs in an agent pool. That pool consists of situations of itself, referred to as recursively. Fugu manages mannequin choice, delegation, verification, and synthesis internally.

Instead of hard-coded roles or workflows, Fugu learns the way to coordinate. It decides when to delegate and the way brokers ought to talk. It then combines their work into one reply. From the surface, you name a single mannequin. Inside, a coordinated system of consultants does the work.

Sakana AI frames this as a hedge in opposition to single-vendor dependency. If one supplier restricts entry, Fugu routes across the disruption. The analysis crew cites current export controls on Anthropic’s Fable and Mythos fashions as motivation. Over time, newer fashions will be folded into the pool.

Fugu and Fugu Ultra: Two Models, One API

Fugu ships in two variants, each behind one OpenAI-compatible API:

Fugu balances sturdy efficiency with low latency. It is a default for on a regular basis coding, code assessment, and chatbots. It additionally matches instruments like Codex. You can choose particular brokers out of its pool. That helps groups meet information, privateness, and compliance necessities.
Fugu Ultra is tuned for max reply high quality on onerous, multi-step issues. It coordinates a deeper pool of knowledgeable brokers. Its pool is mounted, so opt-out isn’t accessible. The present mannequin ID is fugu-ultra-20260615.

The Research Behind the Orchestrator

Fugu builds on two ICLR 2026 papers Trinity and the Conductor on realized orchestration.

TRINITY makes use of a light-weight advanced coordinator throughout a number of turns. It assigns Thinker, Worker, or Verifier roles to delegate work adaptively. Conductor is educated with reinforcement studying. It discovers natural-language coordination methods and targeted prompts for various LLM swimming pools.

Together, they present techniques can be taught to assemble and route brokers per job. That replaces hand-designed workflows.

Interactive Explainer

<br />

Benchmark

Sakana AI compares Fugu in opposition to the muse fashions it orchestrates. Baselines use provider-reported scores. SWE Bench Pro makes use of the mini-swe-agent as scaffolding.

Benchmark	Fugu	Fugu Ultra	Opus 4.8	Gemini 3.1 Pro	GPT 5.5
SWE Bench Pro*	59.0	73.7	69.2	54.2	58.6
TerminalBench 2.1	80.2	82.1	74.6	70.3	78.2
ResideCodeBench	92.9	93.2	87.8	88.5	85.3
ResideCodeBench Pro	87.8	90.8	84.8	82.9	88.4
Humanity’s Last Exam	47.2	50.0	49.8	44.4	41.4
CharXiv Reasoning	85.1	86.6	84.2	83.3	84.1
GPQA-D	95.5	95.5	92.0	94.3	93.6
SciCode	60.1	58.7	53.5	58.9	56.1
τ³ Banking	21.7	20.6	20.6	8.4	20.6
Long Context Reasoning	74.7	73.3	67.7	72.7	74.3
MRCRv2	86.6	93.6	87.9	84.9	94.8

The orchestrator posts the highest rating on 10 of 11 rows. Fugu Ultra tops the 4 coding benchmarks, CharXiv Reasoning, and Humanity’s Last Exam. It ties common Fugu on GPQA-D. Regular Fugu leads SciCode, τ³ Banking, and Long Context Reasoning. GPT 5.5 wins MRCRv2, the one baseline win right here.

Its Fugu fashions stand shoulder-to-shoulder with Anthropic’s Fable 5 and Mythos Preview. Those two will not be in Fugu’s pool, since they don’t seem to be publicly accessible.

Use Cases

Sakana AI ran a beta with near 500 early customers. The revealed examples favor lengthy, multi-step duties.

AutoResearch: An agent improved a small GPT’s coaching recipe autonomously. It ran 123 experiments over roughly 14 hours on one H100 GPU. Fugu Ultra reached the perfect imply validation BPB of 0.9774, with a finest single run of 0.9748.
Rubik’s Cube solver: Each mannequin wrote a pure-Python solver, no libraries allowed. Fugu Ultra solved all 300 held-out cubes, averaging 19.72 strikes. One baseline matched it intently at 19.76 strikes. Two others crashed and solved none.
Classical Japanese kana studying order: On a 1610 letter, Fugu Ultra scored NED 0.80. The nearest baseline reached solely 0.24.
Blindfold chess: Fugu performed 4 video games from reminiscence, with no board proven. It beat three frontier fashions and a 2100-Elo Stockfish engine.
Online buying and selling: On one 50-week window, Fugu Ultra returned +19.43% on common throughout 5 runs. The different frontier fashions stayed beneath +15%. Sakana AI notes previous efficiency doesn’t assure future outcomes.

A Minimal API Example

Fugu makes use of an OpenAI-compatible API, so no SDK migration is required. Point an current consumer at your console-provided endpoint.

Copy Code

from openai import OpenAI

# Endpoint and key come out of your Sakana console (console.sakana.ai).
consumer = OpenAI(
    base_url="https://<your-fugu-endpoint>/v1",  # from console.sakana.ai
    api_key="YOUR_SAKANA_API_KEY",
)

resp = consumer.chat.completions.create(
    mannequin="fugu-ultra-20260615",           # or "fugu"
    messages=[
        {"role": "user",
         "content": "Reproduce the method in this paper and report the gap."},
    ],
)

print(resp.decisions[0].message.content material)

Token utilization and price are reported per request. So you may monitor spend in actual time.

Community Reactions

Sakana Fugu — Early Community Sentiment

A handbook assessment of public response on X and Hacker News, with hyperlinks to each supply. Captured June 22, 2026.

12 posts reviewed

Sentiment cut up (n = 12)

Supportive 3

Skeptical 6

Critical 3

Supportive
Skeptical
Critical

Early response skews skeptical. The “is that this simply a router or wrapper?” query dominates. The clearest supportive voices are Sakana‑affiliated.

Press & evaluation

Hacker News thread · 50 pts &nearr;
VentureBeat report &nearr;
Clanker Cloud analysis &nearr;

Method: sentiment was assigned by hand from a small pattern of public posts on June 22, 2026. This isn’t a statistical survey, and the cut up can shift as extra reactions arrive. Two of the three supportive posts are from Sakana AI or its CEO. Quotes are shortened; comply with every hyperlink for full context. The Reddit quote is as reported by VentureBeat.

Marktechpost · Sakana Fugu sentiment tracker
Sources: X · Hacker News · VentureBeat

Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

TL;DR

What is Sakana Fugu

Fugu and Fugu Ultra: Two Models, One API

The Research Behind the Orchestrator

Interactive Explainer

Benchmark

Use Cases

A Minimal API Example

Community Reactions

Sakana Fugu — Early Community Sentiment

Anthropic AI Releases Bloom: An Open-Source Agentic Framework for Automated Behavioral Evaluations of Frontier AI Models

How to Build a Netflix VOID Video Object Removal and Inpainting Pipeline with CogVideoX, Custom Prompting, and End-to-End Sample Inference

Building a Comprehensive AI Agent Evaluation Framework with Metrics, Reports, and Visual Dashboards

Physical AI: Bridging Robotics, Material Science, and Artificial Intelligence for Next-Gen Embodied Systems

How to Build an Advanced AI Agent with Summarized Short-Term and Vector-Based Long-Term Memory

Unlocking your retail insights with LLMs

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

TL;DR

What is Sakana Fugu

Fugu and Fugu Ultra: Two Models, One API

The Research Behind the Orchestrator

Interactive Explainer

Benchmark

Use Cases

A Minimal API Example

Community Reactions

Sakana Fugu — Early Community Sentiment

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!