MiniMax Releases MiniMax M2: A Mini Open Model Built for Max Coding and Agentic Workflows at 8% Claude Sonnet Price and ~2x Faster

ByRicardo October 29, 2025October 29, 2025

Can an open supply MoE really energy agentic coding workflows at a fraction of flagship mannequin prices whereas sustaining long-horizon software use throughout MCP, shell, browser, retrieval, and code? MiniMax crew has simply launched MiniMax-M2, a mix of consultants MoE mannequin optimized for coding and agent workflows. The weights are printed on Hugging Face underneath the MIT license, and the mannequin is positioned as for finish to finish software use, multi file enhancing, and lengthy horizon plans, It lists 229B whole parameters with about 10B lively per token, which retains reminiscence and latency in examine throughout agent loops.

https://github.com/MiniMax-AI/MiniMax-M2

Architecture and why activation dimension issues?

MiniMax-M2 is a compact MoE that routes to about 10B lively parameters per token. The smaller activations cut back reminiscence stress and tail latency in plan, act, and confirm loops, and enable extra concurrent runs in CI, browse, and retrieval chains. This is the efficiency price range that allows the velocity and value claims relative to dense fashions of comparable high quality.

MiniMax-M2 is an interleaved considering mannequin. The analysis crew wrapped inner reasoning in <suppose>...</suppose> blocks, and instructs customers to maintain these blocks within the dialog historical past throughout turns. Removing these segments harms high quality in multi step duties and software chains. This requirement is specific on the model page on HF.

Benchmarks that concentrate on coding and brokers

The MiniMax crew reviews a set of agent and code evaluations are nearer to developer workflows than static QA. On Terminal Bench, the desk exhibits 46.3. On Multi SWE Bench, it exhibits 36.2. On BrowseComp, it exhibits 44.0. SWE Bench Verified is listed at 69.4 with the scaffold element, OpenFingers with 128k context and 100 steps.

MiniMax’s official announcement stresses 8% of Claude Sonnet pricing, and close to 2x velocity, plus a free entry window. The similar word supplies the particular token costs and the trial deadline.

Comparison M1 vs M2

Aspect	MiniMax M1	MiniMax M2
Total parameters	456B whole	229B in mannequin card metadata, mannequin card textual content says 230B whole
Active parameters per token	45.9B lively	10B lively
Core design	Hybrid Mixture of Experts with Lightning Attention	Sparse Mixture of Experts concentrating on coding and agent workflows
Thinking format	Thinking price range variants 40k and 80k in RL coaching, no suppose tag protocol required	Interleaved considering with `<suppose>...</suppose>` segments that have to be preserved throughout turns
Benchmarks highlighted	AIME, LiveCodeBench, SWE-bench Verified, TAU-bench, lengthy context MRCR, MMLU-Pro	Terminal-Bench, Multi SWE-Bench, SWE-bench Verified, BrowseComp, GAIA textual content solely, Artificial Analysis intelligence suite
Inference defaults	temperature 1.0, high p 0.95	mannequin card exhibits temperature 1.0, high p 0.95, high ok 40, launch web page exhibits high ok 20
Serving steering	vLLM really helpful, Transformers path additionally documented	vLLM and SGLang really helpful, software calling information offered
Primary focus	Long context reasoning, environment friendly scaling of take a look at time compute, CISPO reinforcement studying	Agent and code native workflows throughout shell, browser, retrieval, and code runners

Key Takeaways

M2 ships as open weights on Hugging Face underneath MIT, with safetensors in F32, BF16, and FP8 F8_E4M3.
The mannequin is a compact MoE with 229B whole parameters and ~10B lively per token, which the cardboard ties to decrease reminiscence use and steadier tail latency in plan, act, confirm loops typical of brokers.
Outputs wrap inner reasoning in <suppose>...</suppose> and the mannequin card explicitly instructs retaining these segments in dialog historical past, warning that removing degrades multi-step and tool-use efficiency.
Reported outcomes cowl Terminal-Bench, (Multi-)SWE-Bench, BrowseComp, and others, with scaffold notes for reproducibility, and day-0 serving is documented for SGLang and vLLM with concrete deploy guides.

Editorial Notes

MiniMax M2 lands with open weights underneath MIT, a mix of consultants design with 229B whole parameters and about 10B activated per token, which targets agent loops and coding duties with decrease reminiscence and steadier latency. It ships on Hugging Face in safetensors with FP32, BF16, and FP8 codecs, and supplies deployment notes plus a chat template. The API paperwork Anthropic appropriate endpoints and lists pricing with a restricted free window for analysis. vLLM and SGLang recipes can be found for native serving and benchmarking. Overall, MiniMax M2 is a really stable open launch.

Check out the API Doc, Weights and Repo. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to observe us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The put up MiniMax Releases MiniMax M2: A Mini Open Model Built for Max Coding and Agentic Workflows at 8% Claude Sonnet Price and ~2x Faster appeared first on MarkTechPost.

Agentic AI AI Shorts

Moonshot AI Releases Kosong: The LLM Abstraction Layer that Powers Kimi CLI
ByRicardo November 11, 2025

Modern agentic functions not often discuss to a single mannequin or a single software, so how do you retain that stack maintainable when suppliers, fashions and instruments maintain altering each few weeks. Moonshot AI’s Kosong targets this downside as an LLM abstraction layer for agent functions. Kosong unifies message buildings, asynchronous software orchestration and pluggable…

Read More Moonshot AI Releases Kosong: The LLM Abstraction Layer that Powers Kimi CLI
Agentic AI AI Agents

How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval?
ByRicardo October 1, 2025October 1, 2025

In this tutorial, we stroll by the implementation of an Agentic Retrieval-Augmented Generation (RAG) system. We design it in order that the agent does extra than simply retrieve paperwork; it actively decides when retrieval is required, selects the perfect retrieval technique, and synthesizes responses with contextual consciousness. By combining embeddings, FAISS indexing, and a mock…

Read More How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval?
Agentic AI AI Agents

OpenAI Introduces GPT-5.1: Combining Adaptive Reasoning, Account Level Personalization, And Updated Safety Metrics In The GPT-5 Stack
ByRicardo November 13, 2025

OpenAI has launched GPT-5.1 as the subsequent iteration within the GPT-5 household, with 2 core variants, GPT-5.1 Instant and GPT-5.1 Thinking. The replace focuses on 3 axes, adaptive reasoning conduct, clearer explanations, and stronger management over tone and security. Model Lineup And Positioning GPT-5.1 Instant is the default conversational mannequin in ChatGPT. OpenAI describes it…

Read More OpenAI Introduces GPT-5.1: Combining Adaptive Reasoning, Account Level Personalization, And Updated Safety Metrics In The GPT-5 Stack
Agentic AI AI Agents

Salesforce AI Research Introduces WALT (Web Agents that Learn Tools): Enabling LLM agents to Automatically Discover Reusable Tools from Any Website
ByRicardo October 24, 2025

A crew of Salesforce AI researchers launched WALT (Web Agents that Learn Tools), a framework that reverse-engineers latent web site performance into reusable invocable instruments. It reframes browser automation round callable instruments quite than lengthy chains of clicks. Agents then name operations equivalent to search, filter, type, post_comment, and create_listing. This reduces dependence on giant…

Read More Salesforce AI Research Introduces WALT (Web Agents that Learn Tools): Enabling LLM agents to Automatically Discover Reusable Tools from Any Website
AI Agents Editors Pick

OpenAI Releases ChatGPT ‘Pulse’: Proactive, Personalized Daily Briefings for Pro Users
ByRicardo September 25, 2025

OpenAI launched ChatGPT Pulse, a proactive expertise that compiles personalised, research-backed updates every morning. In preview on cell and restricted to $200/month Pro subscribers, Pulse surfaces topical playing cards constructed from a consumer’s chats, express suggestions, and opt-in related apps (e.g., calendar/e-mail), shifting ChatGPT from a request-driven instrument to a context-aware assistant. What Pulse Actually…

Read More OpenAI Releases ChatGPT ‘Pulse’: Proactive, Personalized Daily Briefings for Pro Users
Agentic AI AI Infrastructure

Thinking Machines Lab Makes Tinker Generally Available: Adds Kimi K2 Thinking And Qwen3-VL Vision Input
ByRicardo December 19, 2025

Thinking Machines Lab has moved its Tinker training API into general availability and added 3 major capabilities, support for the Kimi K2 Thinking reasoning model, OpenAI compatible sampling, and image input through Qwen3-VL vision language models. For AI engineers, this turns Tinker into a practical way to fine tune frontier models without building distributed training…

Read More Thinking Machines Lab Makes Tinker Generally Available: Adds Kimi K2 Thinking And Qwen3-VL Vision Input

MiniMax Releases MiniMax M2: A Mini Open Model Built for Max Coding and Agentic Workflows at 8% Claude Sonnet Price and ~2x Faster

Architecture and why activation dimension issues?