MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2

MiniMax has formally open-sourced MiniMax M2.7, making the mannequin weights publicly out there on Hugging Face. Originally introduced on March 18, 2026, MiniMax M2.7 is the MiniMax’s most succesful open-source mannequin thus far — and its first mannequin to actively take part in its personal growth cycle, a significant shift in how massive language fashions are constructed and iterated.

What is MiniMax M2.7?

MiniMax M2.7 is a part of MiniMax’s M2-series of Mixture-of-Experts (MoE) fashions. MoE is an architectural design the place solely a subset of the overall parameters are ‘activated’ throughout any inference go, which makes the mannequin considerably quicker and cheaper to serve in comparison with a dense mannequin of comparable output high quality.

MiniMax M2.7 is constructed round three core functionality areas: skilled software program engineering, skilled workplace work, and what MiniMax calls Agent Teams — native multi-agent collaboration. MiniMax M2.7 is able to constructing complicated agent harnesses and finishing extremely elaborate productiveness duties, leveraging capabilities similar to Agent Teams, complicated Skills, and dynamic device search.

SOTA Benchmark Performance: SWE-Pro and Terminal Bench 2

On SWE-Pro, which covers a number of programming languages, MiniMax M2.7 achieved a 56.22% accuracy price, matching GPT-5.3-Codex. SWE-Pro duties span log evaluation, bug troubleshooting, code safety evaluation, and machine studying workflow debugging — a lot nearer to the messy actuality of manufacturing programs than normal algorithmic coding checks.

On Terminal Bench 2 (57.0%) and NL2Repo (39.8%), each of which demand a excessive diploma of system-level comprehension, MiniMax M2.7 performs solidly. The mannequin excels not solely at code era however also can deeply perceive the operational logic and collaborative dynamics of software program programs.

On the repo-level code era benchmark VIBE-Pro, MiniMax M2.7 scored 55.6%, practically on par with Opus 4.6 — that means whether or not the requirement entails Web, Android, iOS, or simulation duties, they are often handed on to MiniMax M2.7 to finish. It additionally demonstrates a robust benefit on benchmarks nearer to real-world engineering eventualities: SWE Multilingual (76.5) and Multi SWE Bench (52.7).

Production Debugging: Under Three Minutes

When confronted with alerts in manufacturing, MiniMax M2.7 can correlate monitoring metrics with deployment timelines to carry out causal reasoning, conduct statistical evaluation on hint sampling and suggest exact hypotheses, proactively hook up with databases to confirm root causes, pinpoint lacking index migration information within the code repository, and use non-blocking index creation to cease the bleeding earlier than submitting a merge request. MiniMax staff studies that on a number of events, this decreased restoration time for dwell manufacturing system incidents to below three minutes. From observability evaluation and database experience to SRE-level decision-making, this positions MiniMax M2.7 as one thing past a code-generation mannequin.

The Self-Evolution Architecture

To take a look at the boundaries of autonomous enchancment, MiniMax M2.7 was tasked with optimizing a mannequin’s programming efficiency on an inner scaffold. It ran solely autonomously, executing an iterative loop of ‘analyze failure trajectories → plan modifications → modify scaffold code → run evaluations → evaluate outcomes → resolve to maintain or revert modifications’ for over 100 rounds. During this course of, MiniMax M2.7 found efficient optimizations on its personal: systematically looking for the optimum mixture of sampling parameters similar to temperature, frequency penalty, and presence penalty; designing extra particular workflow pointers (similar to routinely looking out for a similar bug sample in different information after a repair); and including loop detection to the scaffold’s agent loop. This achieved a 30% efficiency enchancment on inner analysis units.

Within MiniMax’s personal reinforcement studying staff workflows, M2.7 is now able to dealing with 30%–50% of the workflow end-to-end, with human researchers solely interacting for essential selections and discussions.

MLE Bench Lite: Testing Autonomous ML Experimentation

MiniMax staff additionally examined MiniMax M2.7 on MLE Bench Lite, OpenAI’s open-sourced suite of twenty-two machine studying competitions runnable on a single A30 GPU, overlaying nearly all phases of the ML workflow.

For this analysis, MiniMax staff designed a easy three-component harness: short-term reminiscence, self-feedback, and self-optimization. After every iteration spherical, the agent generates a short-term reminiscence markdown file, performs self-criticism on the present outcomes, and offers optimization instructions for the following spherical. Three trials have been run, every with a 24-hour window for iterative evolution.

The finest run achieved 9 gold medals, 5 silver medals, and 1 bronze medal. The common medal price throughout the three runs was 66.6%, a outcome second solely to Opus-4.6 (75.7%) and GPT-5.4 (71.2%), tying with Gemini-3.1 (66.6%).

Professional Office Work and Finance

Beyond software program engineering, MiniMax M2.7 targets skilled workplace duties. In the GDPval-AA analysis, which measures area experience and activity supply functionality throughout 45 fashions, MiniMax M2.7 achieved an ELO rating of 1495 — the best amongst open-source fashions, second solely to Opus 4.6, Sonnet 4.6, and GPT-5.4, and surpassing GPT-5.3.

On Toolathon, MiniMax M2.7 achieved an accuracy of 46.3%, reaching the worldwide prime tier. In MM Claw testing — an analysis MiniMax constructed based mostly on real-world utilization patterns from the OpenClaw private agent platform — MiniMax M2.7 maintained a 97% ability compliance price throughout 40 complicated abilities (every exceeding 2,000 tokens) and achieved an general accuracy of 62.7%, approaching Sonnet 4.6.

In finance, MiniMax M2.7 can autonomously learn an organization’s annual studies and earnings name transcripts, cross-reference a number of analysis studies, independently design assumptions and construct a income forecast mannequin, and produce a PPT and Word analysis report based mostly on templates — understanding, making judgments, and producing output like a junior analyst.

Key Takeaways

MiniMax M2.7 is now formally open supply, with weights out there on Hugging Face, making a frontier-grade agentic mannequin freely accessible for builders to deploy and construct on.
MiniMax M2.7 achieves SOTA efficiency on real-world software program engineering benchmarks, scoring 56.22% on SWE-Pro (matching GPT-5.3-Codex) and 57.0% on Terminal Bench 2 — checks that measure production-level reasoning, not simply code era.
MiniMax M2.7 is the primary mannequin to actively take part in its personal growth, working over 100 autonomous rounds of scaffold optimization and attaining a 30% efficiency enchancment — an early, concrete instance of AI-assisted AI growth in follow.
The mannequin is constructed for actual agentic deployments, sustaining 97% ability adherence throughout 40 complicated abilities (every exceeding 2,000 tokens), supporting native Agent Teams with secure position boundaries, and dealing with 30–50% of MiniMax’s inner RL staff workflows autonomously.
MiniMax M2.7 is the highest-ranked open-source mannequin on GDPval-AA with an ELO rating of 1495 throughout 45 fashions, demonstrating sturdy skilled work capabilities spanning workplace doc modifying, monetary evaluation, and multi-round high-fidelity activity supply.

Check out the Technical details and Model Weight. Also, be happy to observe us on Twitter and don’t overlook to hitch our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so forth.? Connect with us

The submit MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2 appeared first on MarkTechPost.

MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2

What is MiniMax M2.7?

SOTA Benchmark Performance: SWE-Pro and Terminal Bench 2

Production Debugging: Under Three Minutes

The Self-Evolution Architecture

MLE Bench Lite: Testing Autonomous ML Experimentation

Professional Office Work and Finance

Key Takeaways

Meet LingBot-World-Infinity: An Open Causal World Model With An Agentic Harness

How to Orchestrate a Fully Autonomous Multi-Agent Research and Writing Pipeline Using CrewAI and Gemini for Real-Time Intelligent Collaboration

Building Production-Ready Custom AI Agents for Enterprise Workflows with Monitoring, Orchestration, and Scalability

Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights

Better Code Merging with Less Compute: Meet Osmosis-Apply-1.7B from Osmosis AI

Anthropic Launches Claude Science Beta: A Multi-Agent AI Workbench for Reproducible Genomics, Proteomics, and Cheminformatics Pipelines

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

What is MiniMax M2.7?

SOTA Benchmark Performance: SWE-Pro and Terminal Bench 2

Production Debugging: Under Three Minutes

The Self-Evolution Architecture

MLE Bench Lite: Testing Autonomous ML Experimentation

Professional Office Work and Finance

Key Takeaways

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!