|

MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

MiniMax formally launched MiniMax M3 on June 1, 2026. The mannequin introduces MSA (MiniMax Sparse Attention), a brand new sparse consideration structure that provides M3 a 1M-token context window. M3 additionally helps picture and video enter and desktop laptop operation natively. The API is reside now.

MiniMax M3 is accessible right this moment by way of MiniMax Code, the MiniMax Token Plan, and the MiniMax API. It is the following mannequin within the M-series line after M2.7. MiniMax positions M3 as an open-weight mannequin combining frontier-level coding efficiency, a 1M-token context window, and native multimodal enter in a single structure — the primary to take action, per MiniMax. The corresponding mannequin weights and technical report are scheduled for launch inside 10 days of launch.

MSA: MiniMax Sparse Attention

The central architectural change in MiniMax M3 is MSA (MiniMax Sparse Attention). Standard full consideration has quadratic computational complexity: as context size grows, compute value grows because the sq. of the sequence size. MSA is designed to deal with this.

Sparse consideration mechanisms usually add a pre-filtering stage earlier than computing consideration, avoiding full quadratic value. MiniMax staff states that in comparison with approaches like DSA and MoBA, MSA partitions the KV cache into blocks extra exactly, reaching greater efficient context protection.

At the operator degree, MSA makes use of a “KV outer collect Q” strategy. KV blocks function the outer loop to combination the queries that hit them. Each block is learn solely as soon as and reminiscence entry is contiguous. MiniMax staff experiences that is greater than 4× quicker than open-source implementations equivalent to Flash-Sparse-Attention and flash-moba below MiniMax M3’s head configuration.

The end result: at a context size of 1 million tokens, MiniMax M3’s per-token compute is 1/twentieth that of the previous-generation M2 fashions. MiniMax staff experiences a speedup of greater than 9× within the prefill stage and greater than 15× within the decoding stage at 1M-token context. Across a number of ablation research, MSA matched full consideration on the vast majority of capabilities.

Coding and Agentic Benchmarks

Coding and agentic capabilities are key areas of enchancment for M3. The benchmark outcomes under are reported by MiniMax staff. Several evaluations have been run on MiniMax inner infrastructure, whereas some comparability scores have been taken from official leaderboards or exterior benchmark sources, as famous in MiniMax’s methodology. SWE-Bench Verified was examined on inner infrastructure utilizing Claude Code scaffolding and averaged over 4 runs. SWE-Bench Pro was additionally examined on inner infrastructure utilizing Claude Code scaffolding, with testing logic aligned to the official analysis.

  • SWE-Bench Pro: 59.0% (surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Opus 4.7)
  • Terminal-Bench 2.1: 66.0%
  • SWE-fficiency: 34.8%
  • KernelBench Hard: 28.8% (evaluated on NVIDIA Blackwell GPUs, CUDA functionality sm_120)
  • MCP Atlas: 74.2%
  • Claw-Eval: highest rating amongst fashions evaluated (General Task Group, 161 duties)
  • SVG-Bench: surpasses Opus 4.7

On OmniDocBench, a multimodal doc understanding benchmark, M3 scores above Gemini 3.1 Pro. On OSWorld-Verified (361 samples), M3 achieves a 70.06% activity completion fee for laptop use (Max Steps = 200).

MiniMax additionally constructed an interactive consumer simulator framework for coaching and analysis. It simulates multi-turn developer collaboration: requirement elaboration, answer dialogue, feedback-based correction, steady activity switching, and multi-round challenge iteration. This is meant to scale back the hole between single-turn benchmark efficiency and real-world, multi-turn developer workflows.

Native Multimodality

MiniMax M3 underwent mixed-modality coaching from step 0. Text, photographs, and video are skilled collectively from the start fairly than added post-training. MiniMax staff experiences that interleaved information — sequences the place textual content and photographs are naturally intermixed — is extra crucial to mannequin efficiency than generally assumed. After rebuilding your entire information pipeline for interleaved codecs, coaching information was scaled to the order of 100 trillion tokens.

MiniMax M3 helps picture and video enter and can function a desktop laptop.

Real-World Task Examples from MiniMax

MiniMax paperwork three inner duties within the launch submit:

Paper copy: MiniMax gave MiniMax M3 the ICLR 2025 Outstanding Paper Award-winning paper Learning Dynamics of LLM Finetuning and requested it to breed the experiments independently. M3 ran autonomously for practically 12 hours, produced 18 commits and 23 experimental figures, and accomplished the core experiments with out human intervention. It required multimodal functionality to learn curves and formulation, lengthy context to carry the paper and experiment logs concurrently, and coding functionality to execute the copy throughout a protracted thread.

CUDA kernel optimization: MiniMax requested MiniMax M3 to optimize an FP8 matrix multiplication (GEMM) kernel on NVIDIA Hopper structure GPUs. The mannequin began with solely a activity description, a benchmark analysis script, and a non-functional Triton skeleton — no reference implementation was supplied. Over roughly 24 hours, MiniMax M3 made 147 benchmark submissions and 1,959 software calls. It progressed by way of baseline implementation, autotune configuration technology, efficiency bottleneck analysis, CUDA Graph integration, persistent kernel rewriting, and host-side scheduling optimization. After six landmark rounds of optimization, MiniMax M3 improved Hopper FP8 {hardware} peak utilization from 7.6% to 71.3%, a 9.4× speedup. The greatest answer appeared on the 145th submission. MiniMax notes that almost all different fashions stopped making new progress inside the first 30 submissions; solely Opus 4.7 and M3 continued past that time.

PostTrainBench (autonomous mannequin coaching): MiniMax gave MiniMax M3 4 base fashions that had accomplished pretraining solely. MiniMax M3 autonomously ran the total information synthesis → coaching → analysis → iteration cycle over 12 hours with no human intervention. The goal was for the bottom fashions to accumulate capabilities throughout mathematical reasoning (AIME2025), software calling (BFCL), scientific data reasoning (GPQA Main), arithmetic reasoning (GSM8K), and code technology (HumanEval). MiniMax M3 scored 0.37, under Opus 4.7 (0.42) and GPT-5.5 (0.39), however forward of the opposite fashions examined.

Marktechpost’s Visual Explainer

Overview

MiniMax M3: Frontier Coding, 1M-Token Context, Native Multimodality

MiniMax formally launched M3 on June 1, 2026. The API is reside now. Model weights and technical report might be open-sourced inside 10 days.

M3 is the following mannequin within the M-series line after M2.7. MiniMax positions it as the primary open-weight mannequin to mix all three of the next in a single structure:

1M
Token Context Window
59.0%
SWE-Bench Pro Score
MSA
Sparse Attention Architecture
70.06%
OSWorld-Verified (Computer Use)

Architecture

MSA: MiniMax Sparse Attention

Standard full consideration has quadratic computational complexity. As context size grows, compute value grows because the sq. of the sequence size. MSA is designed to unravel this on the operator degree.

Compared to approaches like DSA and MoBA, MSA partitions the KV cache into blocks extra exactly, reaching greater efficient context protection.

MSA makes use of a “KV outer collect Q” strategy — every KV block is learn solely as soon as, reminiscence entry is contiguous, and arithmetic depth is considerably higher than widespread strategies.

>9×
Prefill Speedup at 1M ctx
>15×
Decoding Speedup at 1M ctx
1/20
Per-token compute vs M2 at 1M
>4×
Faster than Flash-Sparse-Attn

Benchmarks

Coding and Agentic Performance

Results reported by MiniMax. SWE-Bench Verified used Claude Code scaffolding, averaged over 4 runs. SWE-Bench Pro used Claude Code scaffolding, aligned to official analysis.

  • SWE-Bench Pro: 59.0% — surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Opus 4.7
  • Terminal-Bench 2.1: 66.0%
  • SWE-fficiency: 34.8%
  • KernelBench Hard: 28.8% — evaluated on NVIDIA Blackwell GPUs (sm_120)
  • MCP Atlas: 74.2%
  • Claw-Eval: Highest rating amongst fashions evaluated (161 duties)
  • SVG-Bench: Surpasses Opus 4.7
  • OmniDocBench: Above Gemini 3.1 Pro
  • OSWorld-Verified: 70.06% — 361 samples, Max Steps = 200

Multimodality

Native Multimodal Training from Step 0

M3 underwent mixed-modality coaching from step 0. Text, photographs, and video are skilled collectively from the beginning — not added as a post-training functionality.

MiniMax experiences that interleaved information — sequences the place textual content and photographs are naturally intermixed — is extra crucial to mannequin efficiency than generally assumed.

After rebuilding your entire information pipeline for interleaved codecs, coaching information was scaled to the order of 100 trillion tokens.

M3 helps:

  • Image enter
  • Video enter
  • Desktop laptop operation (laptop use)

Real-World Tasks

Three Internal Tasks Documented by MiniMax

  • Paper Reproduction — M3 reproduced the ICLR 2025 paper Learning Dynamics of LLM Finetuning autonomously over ~12 hours, producing 18 commits and 23 experimental figures with no human intervention.
  • CUDA Kernel Optimization — M3 optimized an FP8 GEMM kernel on NVIDIA Hopper GPUs over ~24 hours: 147 benchmark submissions, 1,959 software calls, 6 landmark optimization rounds. Improved Hopper FP8 peak utilization from 7.6% → 71.3% (9.4× speedup). Best answer appeared on submission 145.
  • PostTrainBench — M3 autonomously ran information synthesis → coaching → analysis → iteration for 4 base fashions over 12 hours. Scored 0.37, under Opus 4.7 (0.42) and GPT-5.5 (0.39), however forward of different evaluated fashions. Targets: AIME2025, BFCL, GPQA Main, GSM8K, HumanEval.

MiniMax Code

MiniMax Code: Agent Product Built and Trained with M3

MiniMax Code is an agent product constructed and skilled collectively with M3. Available at agent.minimaxi.com/obtain. Works with MiniMax Token Plans.

  • Agent Teams — a number of brokers run concurrent, multi-stage, dynamically adjustable workflows
  • Producer + Verifier loop — adversarial harness allows steady self-correction throughout execution
  • Computer use — M3’s native multimodal functionality allows cross-application desktop automation
  • Built on OpenCode and Pi — MiniMax states it plans to open-source MiniMax Code sooner or later
// Example use case
User (on telephone): “Open the native ERP shopper
and batch-enter bill information from this Excel file.”
→ MiniMax Code handles operations throughout
purposes, recordsdata, and techniques on desktop.

API & Pricing

API Details and Token Plan Tiers

The M3 API is reside at platform.minimax.io.

Pricing by enter size: Calls ≤512K tokens → customary fee. Calls >512K → greater long-context fee.

Thinking mode: Toggle on/off at request time. Both modes share the identical pricing.

Service tiers: customary (default) and precedence (service_tier=precedence) — precedence out there by way of gross sales, opening to all customers quickly.

Plus
~1.7B tokens/mo
$20/mo
Max
~5.1B tokens/mo
$50/mo
Ultra
~9.8B tokens/mo
$120/mo

Text, picture, speech, and music utilization all draw from the identical token pool.

Key Takeaways

What Engineers and Researchers Need to Know

  • MiniMax M3 launched June 1, 2026. API is reside. Open mannequin weights and technical report dedicated inside 10 days.
  • MSA delivers >9× prefill and >15× decoding speedup at 1M-token context vs M2, at 1/twentieth the per-token compute.
  • M3 scores 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro.
  • Natively multimodal from step 0 — helps picture, video enter, and 70.06% on OSWorld-Verified for laptop use.
  • Thinking mode toggleable at request time. Token Plan begins at $20/month (~1.7B M3 tokens).

1 / 8


Key Takeaways

  • MiniMax M3 launched June 1, 2026; API is reside now. MiniMax has dedicated to releasing open mannequin weights and a technical report inside 10 days.
  • MSA (MiniMax Sparse Attention) delivers greater than 9× prefill and greater than 15× decoding speedup at 1M-token context versus M2, at 1/twentieth the per-token compute.
  • M3 scores 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro.
  • M3 is natively multimodal from step 0, supporting picture and video enter, and achieves 70.06% on OSWorld-Verified for laptop use.


Check out the Technical detailsAlso, be at liberty to comply with us on Twitter and don’t overlook to hitch our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so forth.? Connect with us

The submit MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding appeared first on MarkTechPost.

Similar Posts