MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding
MiniMax formally launched MiniMax M3 on June 1, 2026. The mannequin introduces MSA (MiniMax Sparse Attention), a brand new sparse consideration structure that provides M3 a 1M-token context window. M3 additionally helps picture and video enter and desktop laptop operation natively. The API is reside now.
MiniMax M3 is accessible right this moment by way of MiniMax Code, the MiniMax Token Plan, and the MiniMax API. It is the following mannequin within the M-series line after M2.7. MiniMax positions M3 as an open-weight mannequin combining frontier-level coding efficiency, a 1M-token context window, and native multimodal enter in a single structure — the primary to take action, per MiniMax. The corresponding mannequin weights and technical report are scheduled for launch inside 10 days of launch.
MSA: MiniMax Sparse Attention
The central architectural change in MiniMax M3 is MSA (MiniMax Sparse Attention). Standard full consideration has quadratic computational complexity: as context size grows, compute value grows because the sq. of the sequence size. MSA is designed to deal with this.
Sparse consideration mechanisms usually add a pre-filtering stage earlier than computing consideration, avoiding full quadratic value. MiniMax staff states that in comparison with approaches like DSA and MoBA, MSA partitions the KV cache into blocks extra exactly, reaching greater efficient context protection.
At the operator degree, MSA makes use of a “KV outer collect Q” strategy. KV blocks function the outer loop to combination the queries that hit them. Each block is learn solely as soon as and reminiscence entry is contiguous. MiniMax staff experiences that is greater than 4× quicker than open-source implementations equivalent to Flash-Sparse-Attention and flash-moba below MiniMax M3’s head configuration.
The end result: at a context size of 1 million tokens, MiniMax M3’s per-token compute is 1/twentieth that of the previous-generation M2 fashions. MiniMax staff experiences a speedup of greater than 9× within the prefill stage and greater than 15× within the decoding stage at 1M-token context. Across a number of ablation research, MSA matched full consideration on the vast majority of capabilities.
Coding and Agentic Benchmarks
Coding and agentic capabilities are key areas of enchancment for M3. The benchmark outcomes under are reported by MiniMax staff. Several evaluations have been run on MiniMax inner infrastructure, whereas some comparability scores have been taken from official leaderboards or exterior benchmark sources, as famous in MiniMax’s methodology. SWE-Bench Verified was examined on inner infrastructure utilizing Claude Code scaffolding and averaged over 4 runs. SWE-Bench Pro was additionally examined on inner infrastructure utilizing Claude Code scaffolding, with testing logic aligned to the official analysis.
- SWE-Bench Pro: 59.0% (surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Opus 4.7)
- Terminal-Bench 2.1: 66.0%
- SWE-fficiency: 34.8%
- KernelBench Hard: 28.8% (evaluated on NVIDIA Blackwell GPUs, CUDA functionality sm_120)
- MCP Atlas: 74.2%
- Claw-Eval: highest rating amongst fashions evaluated (General Task Group, 161 duties)
- SVG-Bench: surpasses Opus 4.7
On OmniDocBench, a multimodal doc understanding benchmark, M3 scores above Gemini 3.1 Pro. On OSWorld-Verified (361 samples), M3 achieves a 70.06% activity completion fee for laptop use (Max Steps = 200).
MiniMax additionally constructed an interactive consumer simulator framework for coaching and analysis. It simulates multi-turn developer collaboration: requirement elaboration, answer dialogue, feedback-based correction, steady activity switching, and multi-round challenge iteration. This is meant to scale back the hole between single-turn benchmark efficiency and real-world, multi-turn developer workflows.
Native Multimodality
MiniMax M3 underwent mixed-modality coaching from step 0. Text, photographs, and video are skilled collectively from the start fairly than added post-training. MiniMax staff experiences that interleaved information — sequences the place textual content and photographs are naturally intermixed — is extra crucial to mannequin efficiency than generally assumed. After rebuilding your entire information pipeline for interleaved codecs, coaching information was scaled to the order of 100 trillion tokens.
MiniMax M3 helps picture and video enter and can function a desktop laptop.
Real-World Task Examples from MiniMax
MiniMax paperwork three inner duties within the launch submit:
Paper copy: MiniMax gave MiniMax M3 the ICLR 2025 Outstanding Paper Award-winning paper Learning Dynamics of LLM Finetuning and requested it to breed the experiments independently. M3 ran autonomously for practically 12 hours, produced 18 commits and 23 experimental figures, and accomplished the core experiments with out human intervention. It required multimodal functionality to learn curves and formulation, lengthy context to carry the paper and experiment logs concurrently, and coding functionality to execute the copy throughout a protracted thread.
CUDA kernel optimization: MiniMax requested MiniMax M3 to optimize an FP8 matrix multiplication (GEMM) kernel on NVIDIA Hopper structure GPUs. The mannequin began with solely a activity description, a benchmark analysis script, and a non-functional Triton skeleton — no reference implementation was supplied. Over roughly 24 hours, MiniMax M3 made 147 benchmark submissions and 1,959 software calls. It progressed by way of baseline implementation, autotune configuration technology, efficiency bottleneck analysis, CUDA Graph integration, persistent kernel rewriting, and host-side scheduling optimization. After six landmark rounds of optimization, MiniMax M3 improved Hopper FP8 {hardware} peak utilization from 7.6% to 71.3%, a 9.4× speedup. The greatest answer appeared on the 145th submission. MiniMax notes that almost all different fashions stopped making new progress inside the first 30 submissions; solely Opus 4.7 and M3 continued past that time.
PostTrainBench (autonomous mannequin coaching): MiniMax gave MiniMax M3 4 base fashions that had accomplished pretraining solely. MiniMax M3 autonomously ran the total information synthesis → coaching → analysis → iteration cycle over 12 hours with no human intervention. The goal was for the bottom fashions to accumulate capabilities throughout mathematical reasoning (AIME2025), software calling (BFCL), scientific data reasoning (GPQA Main), arithmetic reasoning (GSM8K), and code technology (HumanEval). MiniMax M3 scored 0.37, under Opus 4.7 (0.42) and GPT-5.5 (0.39), however forward of the opposite fashions examined.
Marktechpost’s Visual Explainer
Key Takeaways
- MiniMax M3 launched June 1, 2026; API is reside now. MiniMax has dedicated to releasing open mannequin weights and a technical report inside 10 days.
- MSA (MiniMax Sparse Attention) delivers greater than 9× prefill and greater than 15× decoding speedup at 1M-token context versus M2, at 1/twentieth the per-token compute.
- M3 scores 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro.
- M3 is natively multimodal from step 0, supporting picture and video enter, and achieves 70.06% on OSWorld-Verified for laptop use.
Check out the Technical details. Also, be at liberty to comply with us on Twitter and don’t overlook to hitch our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so forth.? Connect with us
The submit MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding appeared first on MarkTechPost.
