Moonshot AI Releases Kimi K2.6 with Long-Horizon Coding, Agent Swarm Scaling to 300 Sub-Agents and 4,000 Coordinated Steps
Moonshot AI, the Chinese AI lab behind the Kimi assistant, at this time open-sourced Kimi K2.6 — a local multimodal agentic mannequin that pushes the boundaries of what an AI system can do when left to run autonomously on laborious software program engineering issues. The launch targets sensible deployment eventualities: long-running coding brokers, front-end technology from pure language, massively parallel agent swarms coordinating lots of of specialised sub-agents concurrently, and a brand new open ecosystem the place people and brokers from any system collaborate on the identical process. The mannequin is out there now on Kimi.com, the Kimi App, the API, and Kimi Code CLI. Weights are revealed on Hugging Face beneath a Modified MIT License.
What Kind of Model is This, Technically?
Kimi K2.6 is a Mixture-of-Experts (MoE) mannequin — an structure that’s turn into more and more dominant at frontier scale. Instead of activating all of a mannequin’s parameters for each token it processes, a MoE mannequin routes every token to a small subset of specialised ‘specialists.’ This permits you to construct a really giant mannequin whereas protecting inference compute tractable.
Kimi K2.6 has 1 trillion whole parameters, however solely 32 billion are activated per token. It has 384 specialists in whole, with 8 chosen per token, plus 1 shared knowledgeable that’s at all times lively. The mannequin has 61 layers (together with one dense layer), makes use of an consideration hidden dimension of seven,168, a MoE hidden dimension of two,048 per knowledgeable, and 64 consideration heads.
Beyond textual content, K2.6 is a native multimodal mannequin — that means imaginative and prescient is baked in architecturally, not bolted on. It makes use of a MoonViT imaginative and prescient encoder with 400M parameters and helps picture and video enter natively. Other architectural particulars: it makes use of Multi-head Latent Attention (MLA) as its consideration mechanism, SwiGLU because the activation operate, a vocabulary measurement of 160K tokens, and a context size of 256K tokens.
For deployment, K2.6 is beneficial to run on vLLM, SGLang, or KTransformers. It shares the identical structure as Kimi K2.5, so present deployment configurations may be reused immediately. The required transformers model is >=4.57.1, <5.0.0.
The Long-Horizon Coding Headline Numbers
The metric that may possible get essentially the most consideration from dev groups is SWE-Bench Pro — a benchmark testing whether or not a mannequin can resolve real-world GitHub points in skilled software program repositories.
Kimi K2.6 scores 58.6 on SWE-Bench Pro, in contrast to 57.7 for GPT-5.4 (xhigh), 53.4 for Claude Opus 4.6 (max effort), 54.2 for Gemini 3.1 Pro (pondering excessive), and 50.7 for Kimi K2.5. On SWE-Bench Verified it scores 80.2, sitting inside a good band of top-tier fashions.
On Terminal-Bench 2.0 utilizing the Terminus-2 agent framework, K2.6 achieves 66.7, in contrast to 65.4 for each GPT-5.4 and Claude Opus 4.6, and 68.5 for Gemini 3.1 Pro. On DwellCodeBench (v6), it scores 89.6 vs. Claude Opus 4.6’s 88.8.
Perhaps essentially the most putting quantity for agentic workloads is Humanity’s Last Exam (HLE-Full) with instruments: K2.6 scores 54.0 — main each mannequin within the comparability, together with GPT-5.4 (52.1), Claude Opus 4.6 (53.0), and Gemini 3.1 Pro (51.4). HLE is broadly thought-about one of many hardest information benchmarks, and the with-tools variant particularly checks how nicely a mannequin can leverage exterior sources autonomously. Internally, Moonshot evaluates long-horizon coding beneficial properties utilizing their Kimi Code Bench, an inner benchmark protecting numerous, sophisticated end-to-end duties throughout languages and domains, the place K2.6 demonstrates vital enhancements over K2.5.

What 13 Hours of Autonomous Coding Actually Looks Like
Two engineering case research within the launch doc what ‘long-horizon coding’ means in observe.
In the primary, Kimi K2.6 efficiently downloaded and deployed the Qwen3.5-0.8B mannequin regionally on a Mac, then applied and optimized mannequin inference in Zig — a extremely area of interest programming language — demonstrating distinctive out-of-distribution generalization. Across 4,000+ software calls, over 12 hours of steady execution, and 14 iterations, K2.6 improved throughput from roughly 15 to roughly 193 tokens/sec, finally reaching speeds roughly 20% sooner than LM Studio.
In the second, Kimi K2.6 autonomously overhauled exchange-core, an 8-year-old open-source monetary matching engine. Over a 13-hour execution, the mannequin iterated by way of 12 optimization methods, initiating over 1,000 software calls to exactly modify greater than 4,000 strains of code. Acting as an knowledgeable techniques architect, K2.6 analyzed CPU and allocation flame graphs to pinpoint hidden bottlenecks and reconfigured the core thread topology from 4ME+2RE to 2ME+1RE — extracting a 185% medium throughput leap (from 0.43 to 1.24 MT/s) and a 133% efficiency throughput achieve (from 1.23 to 2.86 MT/s).
Agent Swarms: Scaling Horizontally, Not Just Vertically
One of K2.6’s most architecturally fascinating capabilities is its Agent Swarm — an method to parallelizing advanced duties throughout many specialised sub-agents, slightly than counting on a single, deeper reasoning chain.
The structure scales horizontally to 300 sub-agents executing throughout 4,000 coordinated steps concurrently, a considerable growth from K2.5’s 100 sub-agents and 1,500 steps. The swarm dynamically decomposes duties into heterogeneous subtasks — combining broad net search with deep analysis, large-scale doc evaluation with long-form writing, and multi-format content material technology in parallel — then delivers consolidated outputs together with paperwork, web sites, slides, and spreadsheets inside a single autonomous run. The swarm additionally introduces a concrete Skills functionality: it might convert any high-quality PDF, spreadsheet, slide, or Word doc right into a reusable Skill. K2.6 captures and maintains the doc’s structural and stylistic DNA, permitting it to reproduce the identical high quality and format in future duties — consider it as instructing the swarm by instance slightly than immediate.
Concrete demonstrations embody: a 100-sub-agent run that matched a single uploaded CV in opposition to 100 related roles in California and delivered 100 totally personalized resumes; one other that recognized 30 retail shops in Los Angeles with out web sites from Google Maps and generated touchdown pages for every; and one which turned an astrophysics paper right into a reusable educational ability and then produced a 40-page, 7,000-word analysis paper alongside a structured dataset with 20,000+ entries and 14 astronomy-grade charts.
On the BrowseComp benchmark in Agent Swarm mode, K2.6 scores 86.3 in contrast to 78.4 for Kimi K2.5. On DeepSearchQA (f1-score), K2.6 scores 92.5 in opposition to 78.6 for GPT-5.4.
Bring Your Own Agents: Claw Groups
Beyond Moonshot’s personal swarm infrastructure, K2.6 introduces Claw Groups as a analysis preview — a brand new characteristic that opens the agent swarm structure to an exterior, heterogeneous ecosystem.
The key design precept: a number of brokers and people function as real collaborators in a shared operational house. Users can onboard brokers from any system, operating any mannequin, every carrying their very own specialised toolkits, expertise, and persistent reminiscence contexts — whether or not deployed on native laptops, cell gadgets, or cloud situations. At the middle of this swarm, K2.6 serves as an adaptive coordinator: it dynamically matches duties to brokers primarily based on their particular ability profiles and obtainable instruments, detects when an agent encounters failure or stalls, routinely reassigns the duty or regenerates subtasks, and manages the total lifecycle of deliverables from initiation by way of validation to completion.
Moonshot has been utilizing Claw Groups internally to run their very own content material manufacturing and launch campaigns, with specialised brokers together with Demo Makers, Benchmark Makers, Social Media Agents, and Video Makers working in parallel — with K2.6 coordinating the method. For devs excited about multi-agent orchestration architectures, that is price wanting into: it represents a shift from ‘AI does duties for you’ to ‘AI coordinates a group of heterogeneous brokers, a few of which you constructed, in your behalf.’
Proactive Agents: 5 Days of Autonomous Operation
K2.6 demonstrates robust efficiency in persistent, proactive brokers corresponding to OpenClaw and Hermes, which function throughout a number of functions with steady, 24/7 execution. These workflows require AI to proactively handle schedules, execute code, and orchestrate cross-platform operations with out human oversight.
Moonshot’s personal RL infrastructure group used a K2.6-backed agent that operated autonomously for five days, managing monitoring, incident response, and system operations — demonstrating persistent context, multi-threaded process dealing with, and full-cycle execution from alert to decision.
Performance on this regime is measured by an inner Claw Bench, an analysis suite spanning 5 domains: Coding Tasks, IM Ecosystem Integration, Information Research & Analysis, Scheduled Task Management, and Memory Utilization. Across all 5, K2.6 considerably outperforms K2.5 in process completion charges and software invocation accuracy — significantly in workflows requiring sustained autonomous operation with out human oversight.
Two Operational Modes: Thinking and Instant
For devs integrating by way of API, K2.6 exposes two inference modes that matter for latency/high quality tradeoffs:
Thinking mode prompts full chain-of-thought reasoning — the mannequin causes by way of an issue earlier than producing a ultimate reply. This is beneficial for advanced coding and agentic duties, with a beneficial temperature of 1.0. There can also be a protect pondering mode, which retains full reasoning content material throughout multi-turn interactions and enhances efficiency in coding agent eventualities — disabled by default, however price enabling when constructing brokers that want to keep coherent reasoning state throughout many steps.
Instant mode disables prolonged reasoning for lower-latency responses. To use Instant mode by way of the official API, move {'pondering': {'kind': 'disabled'}} in extra_body. For vLLM or SGLang deployments, move {'chat_template_kwargs': {"pondering": False}} as a substitute, with a beneficial temperature of 0.6 and top-p of 0.95.
Key Takeaways
- Kimi K2.6 is a 1-trillion-parameter, native multimodal MoE mannequin with solely 32B parameters activated per token, launched totally open-source beneath a Modified MIT License.
- K2.6 leads all frontier fashions on HLE-Full with instruments (54.0), outperforming GPT-5.4 (52.1), Claude Opus 4.6 (53.0), and Gemini 3.1 Pro (51.4) on one in all AI’s hardest agentic benchmarks.
- In real-world checks, K2.6 autonomously overhauled an 8-year-old monetary matching engine over 13 hours, delivering a 185% medium throughput leap and a 133% efficiency throughput achieve.
- The Agent Swarm structure scales to 300 sub-agents executing 4,000 coordinated steps concurrently, and can convert any PDF, spreadsheet, or slide right into a reusable Skill that preserves structural and stylistic DNA.
- Claw Groups, launched as a analysis preview, lets people and brokers from any system operating any mannequin collaborate in a shared swarm, with K2.6 serving as an adaptive coordinator that dynamically assigns duties, detects failures, and manages full supply lifecycles.
Check out the Model Weights, API Access and Technical details. Also, be at liberty to comply with us on Twitter and don’t overlook to be part of our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us
The put up Moonshot AI Releases Kimi K2.6 with Long-Horizon Coding, Agent Swarm Scaling to 300 Sub-Agents and 4,000 Coordinated Steps appeared first on MarkTechPost.
