Sakana AI Commercializes AB-MCTS in Sakana Marlin, an Enterprise Agent Generating Up to 100-Page Research Reports With Slides
Tokyo-based Sakana AI shipped its first industrial product ‘Sakana Marlin’ this week. Sakana staff positions it as a Virtual CSO (Chief Strategy Officer). It is a B2B autonomous analysis agent constructed for enterprises.
Marlin doesn’t reply in seconds like a chatbot. You give it one analysis matter. It then runs autonomously for up to about eight hours. Each run returns an extended report plus a presentation slide deck. Sakana says a single session points a whole lot to 1000’s of LLM queries.
What is Sakana Marlin
Marlin is an enterprise analysis agent, not a chat assistant. You give it one matter or query. It then plans hypotheses, browses sources, and verifies findings by itself. It compresses weeks of technique work into hours.
The deliverable is structured for decision-makers. The Japanese announcement describes experiences of dozens of pages. The English announcement cites experiences of up to roughly 100 pages. At a press hands-on, experiences ran 60–100 pages and cited 60–80 sources. Each report features a predominant physique, references, and appendices. Presentation slides are generated utilizing image-generation AI.
Sakana staff refined Marlin by means of a closed beta in April 2026. Around 300 professionals examined it on actual duties throughout that beta. Those duties spanned technique formulation, market analysis, threat evaluation, and aggressive evaluation. Sakana has additionally partnered with MUFG and brought strategic funding from Citigroup.
Inside AB-MCTS: Wider or Deeper
The spine of Marlin is AB-MCTS, or Adaptive Branching Monte Carlo Tree Search. It comes from the Sakana’s previous analysis “Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search.”
AB-MCTS treats reasoning as a tree-search downside. At every step the algorithm makes one choice. It can go wider by producing a brand new candidate reply. Or it will possibly go deeper by refining a promising current reply. Standard repeated sampling solely goes wider in parallel, then hopes one reply is correct.
A multi-LLM variant provides a second alternative. It can route a step to a distinct mannequin fully. In Sakana’s reported ARC-AGI-2 experiments, this collaboration helped. Combining o4-mini, Gemini 2.5 Pro, and DeepSearch-R1 solved about 27.5% of duties. The o4-mini mannequin alone solved about 23%. Marlin applies the identical adaptive search to long-horizon analysis.
The second key element for Marlin is workflow automation from Sakana’s AI Scientist project. That venture demonstrated autonomous scientific discovery and was printed in Nature.
Interactive demo: The embeddable widget (marlin-abmcts-demo.html) exhibits the “wider or deeper” choice reside. Press Run and watch the tree develop. Greener nodes carry larger scores, and the very best path is highlighted. Toggle “Multi-LLM” to see steps routed throughout completely different fashions.
AB-MCTS: “Wider or Deeper?” — interactive search
Search state
Decision log
excessive rating
greatest path
How Marlin Compares
Marlin competes on depth, not pace. Conventional deep-research instruments reply in minutes to tens of minutes. Marlin intentionally spends hours to increase output high quality. The competitor run occasions beneath are approximate and reported, not official figures.
| Tool | Typical run time | Output | Primary consumer |
|---|---|---|---|
| Sakana Marlin | Up to ~8 hours | Report (dozens to ~100 pages) + slides | Enterprise technique groups |
| OpenAI Deep Research | ~Minutes to tens of minutes | Cited textual content report | General and professional customers |
| Perplexity Deep Research | ~A couple of minutes | Cited textual content reply | General customers |
| Google Gemini Deep Research | ~Minutes | Cited textual content report | General and workspace customers |
The trade-off is specific. You wait longer and pay per run. In return you get deeper speculation testing and a completed deliverable. You can cancel a run anytime, however credit are nonetheless consumed.
Pricing
Sakana presents pay-as-you-go together with Pro, Team, and Enterprise tiers. Pay-as-you-go begins at 100 credit per run, at ¥98 per credit score. Pro is ¥150,000 monthly and consists of 2,000 credit. Team is ¥400,000 monthly and consists of 6,000 credit. Enterprise pricing is customized, with devoted help.
Use Cases, With Examples
Marlin fits high-stakes questions the place analysis is the bottleneck. Here are concrete examples drawn from its goal duties.
- Market entry: 'Assess Japan's stablecoin and tokenized-payments market after regulatory change.' Marlin maps drivers, dangers, and structured choices right into a report.
- Risk evaluation: 'Model decision eventualities for a Strait of Hormuz blockade.' It compares hypotheses, not simply summaries, earlier than drawing conclusions.
- Competitive evaluation: Profile three rivals and rank our positioning gaps. It returns slides prepared for a method evaluation.
Each instance suits one immediate and one unattended run. A human nonetheless opinions the cited output earlier than any choice.
Try the Engine Yourself: TreeQuest
You can not self-host Marlin. But you'll be able to run its core algorithm at this time. Sakana open-sourced AB-MCTS as TreeQuest below the Apache 2.0 license. Install it, outline a generate operate, then run a set search price range.
import random
import treequest as tq
# Each node holds a user-defined state; rating should be normalized to [0, 1].
def generate(parent_state):
if parent_state is None: # None means increase from the basis
new_state = "Initial draft"
else:
new_state = f"Refined: {parent_state}"
rating = random.random() # swap this for an LLM-based rating
return new_state, rating
algo = tq.ABMCTSA() # Adaptive Branching MCTS (variant A)
search_tree = algo.init_tree()
for _ in vary(10): # technology price range of 10
search_tree = algo.step(search_tree, {"generate": generate})
best_state, best_score = tq.top_k(search_tree, algo, ok=1)[0]
print("BEST:", best_state, spherical(best_score, 3))
Swap the random rating for an LLM decide to reproduce the true sample. TreeQuest additionally ships multi-LLM search and checkpointing for lengthy runs. Checkpointing issues as a result of lengthy classes can hit API errors halfway.
Strengths and Weaknesses
Strengths
- Peer-reviewed foundations: AB-MCTS at NeurIPS and AI Scientist in Nature.
- Finished deliverables, together with references, appendices, and slides.
- Adaptive compute spends effort on essentially the most promising branches.
- The open-source core (TreeQuest) lets AI researchers examine the strategy.
Weaknesses
- Long runtimes make iteration sluggish versus minute-scale analysis instruments.
- Automated experiences can comprise hard-to-spot errors that want human evaluation.
- Pricing and design goal enterprises, not particular person builders.
- Marlin itself is closed; solely the underlying algorithm is open.
Key Takeaways
- Sakana Marlin runs autonomous analysis for up to about eight hours per process.
- One run produces a report of dozens of pages, plus slides.
- It builds on AB-MCTS (NeurIPS 2025 Spotlight) and AI Scientist workflows (Nature).
- Entry pricing is pay-as-you-go: 100 credit per run at ¥98 per credit score.
- It targets finance, company technique, consulting, and think-tank groups.
Sources
- Sakana AI — Sakana Marlin launch: https://sakana.ai/marlin-release/
- Sakana AI — Sakana Marlin product web page: https://sakana.ai/marlin/
- Sakana AI — AB-MCTS analysis and TreeQuest: https://sakana.ai/ab-mcts/
- SakanaAI/treequest (GitHub, Apache 2.0): https://github.com/SakanaAI/treequest
The publish Sakana AI Commercializes AB-MCTS in Sakana Marlin, an Enterprise Agent Generating Up to 100-Page Research Reports With Slides appeared first on MarkTechPost.
