Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6

This week, Moonshot AI launched Kimi K2.7-Code. It is a coding-focused, agentic mannequin. The mannequin weights ship on Hugging Face underneath a Modified MIT license. You may also attain it by the Kimi API and Kimi Code.

K2.7-Code targets long-horizon software program engineering, not normal chat. It plans, edits, runs instruments, and debugs throughout many steps. Moonshot pairs the mannequin with a subscription coding platform round it.

Kimi K2.7-Code

K2.7-Code is a Mixture-of-Experts mannequin. It holds 1T whole parameters and prompts 32B per token. The design makes use of 384 consultants, with 8 chosen per token and 1 shared. It has 61 layers, together with 1 dense layer.

Attention makes use of MLA, and the feed-forward path makes use of SwiGLU. A MoonViT imaginative and prescient encoder provides 400M parameters for picture and video enter. The mannequin ships with native INT4 quantization. The context window is 256K tokens (262,144).

Two constraints issues: Thinking mode is obligatory; disabling it returns an API error. Sampling is fastened: temperature 1.0, top_p 0.95, n 1, penalties 0.0. Default max output is 32,768 tokens.

You can self-host with vLLM, SGLang, or KTransformers. The Hugging Face repository is massive, roughly 595 GB on disk. This is a server-class deployment goal, not a laptop computer mannequin.

Benchmark

Moonshot group revealed six benchmark rows. They examine K2.7-Code towards K2.6, GPT-5.5, and Claude Opus 4.8. K2.7-Code beats K2.6 on each row. The largest coding bounce is Kimi Code Bench v2, from 50.9 to 62.0.

Benchmark	Kimi K2.6	Kimi K2.7-Code	GPT-5.5	Claude Opus 4.8	K2.7 vs K2.6
Kimi Code Bench v2	50.9	62.0	69.0	67.4	+21.8%
Program Bench	48.3	53.6	69.1	63.8	+11.0%
MLS Bench Lite	26.7	35.1	35.5	42.8	+31.5%
Kimi Claw 24/7 Bench	42.9	46.9	52.8	50.4	+9.3%
MCP Atlas	69.4	76.0	79.4	81.3	+9.5%
MCP Mark Verified	72.8	81.1	92.9	76.4	+11.4%

K2.7-Code does beat Opus 4.8 on MCP Mark Verified, 81.1 versus 76.4. It additionally lands near GPT-5.5 on MLS Bench Lite. K2.7-Code ran in Kimi Code CLI, GPT-5.5 in Codex xhigh, and Opus 4.8 in Claude Code xhigh.

Reasoning-Token Efficiency: A Cost Claim, Not Just Quality

Moonshot group studies about 30% decrease reasoning-token utilization than K2.6. It frames this as ‘much less overthinking.’

Reasoning tokens invoice as output tokens on most value playing cards. Agentic coding runs lots of or hundreds of steps. Each plan, retry, and verification pays the pondering price once more. A 30% minimize compounds throughout a long term.

The impact lands in three locations without delay. First, decrease output-token price per job. Second, quicker steps, which helps interactive CLI classes. Third, extra steps earlier than hitting context limits.

Use Cases With Examples

Repo-scale refactors are the principle use case. Point the agent at a failing take a look at suite. It reads information, edits throughout modules, then reruns exams till inexperienced.
Code evaluate is a second match. Feed a pull request diff and ask for threat evaluation. The 256K window holds massive diffs, logs, and associated information collectively.
MCP tool-use workflows are a third match. K2.7-Code scored 81.1 on MCP Mark Verified. That suite exams right software invocation by the Model Context Protocol. Think CI checks, ticket updates, and file edits in a single loop.
Long-context evaluation is a fourth match. The mannequin accepts textual content, picture, and video enter. Documentation, screenshots, and a recorded repro can share one immediate.

Marktechpost’s Interactive Explorer

Kimi K2.7-Code — Interactive Explorer

Company-reported benchmarks and official API pricing. Released June 12, 2026. Verified June 12, 2026.

Benchmarks

Cost Calculator

Specs

Source: Moonshot AI Kimi K2.7-Code mannequin card. K2.7-Code ran in Kimi Code CLI; GPT-5.5 in Codex xhigh; Claude Opus 4.8 in Claude Code xhigh. First-party numbers, not an unbiased leaderboard.

Input tokens / run: 50,000

Output tokens / run: 8,000

Cache hit price: 50%

Runs / month: 1,000

Reasoning share of output: 40%

Input price$0.00

Output price$0.00

Est. month-to-month whole$0.00

$0.00

Rates: cached enter $0.19 / 1M, cache-miss enter $0.95 / 1M, output $4.00 / 1M (official Kimi pricing). Savings line illustrates K2.7-Code’s reported ~30% decrease reasoning-token utilization vs K2.6, utilized to the reasoning share of output. Estimate solely.

Source: Kimi K2.7-Code Hugging Face mannequin card and Kimi API docs.

‘;
fashions.forEach(operate(m){
h+=’

‘
+’

‘+m.identify+’

‘
+’

‘;
});
wrap.innerHTML=h;charts.appendChild(wrap);
});

operate renderBars(){
benches.forEach(operate(b,i){
fashions.forEach(operate(m){
var el=root.querySelector(‘#f-‘+i+’-‘+m.key);
var father or mother=el.closest(‘.k27-row’);
if(!energetic[m.key]){father or mother.type.show=’none’;el.type.width=’0′;return;}
father or mother.type.show=’flex’;
el.type.width=b[m.key]+’%’;
el.textContent=b[m.key].toFixed(1);
});
});
}
setTimeout(renderBars,60);

// —- specs —-
var sp=root.querySelector(‘#k27-specs’);
specs.forEach(operate(s){
var d=doc.createElement(‘div’);d.className=’k27-spec’;
d.innerHTML=’

‘+s[0]+’

‘+s[1]+’

‘;
sp.appendChild(d);
});

// —- calculator —-
var R_CACHE=0.19, R_MISS=0.95, R_OUT=4.00; // per 1M tokens
operate fmt(n){return ‘$’+n.toLocaleString(‘en-US’,{minimumFractionDigits:2,maximumFractionDigits:2});}
operate comma(n){return n.toLocaleString(‘en-US’);}
var I={inp:root.querySelector(‘#k27-in’),out:root.querySelector(‘#k27-out’),
cache:root.querySelector(‘#k27-cache’),runs:root.querySelector(‘#k27-runs’),
suppose:root.querySelector(‘#k27-think’)};

operate calc(){
var inp=+I.inp.worth, out=+I.out.worth, cache=+I.cache.worth/100,
runs=+I.runs.worth, suppose=+I.suppose.worth/100;
root.querySelector(‘#k27-in-v’).textContent=comma(inp);
root.querySelector(‘#k27-out-v’).textContent=comma(out);
root.querySelector(‘#k27-cache-v’).textContent=(cache*100).toFixed(0)+’%’;
root.querySelector(‘#k27-runs-v’).textContent=comma(runs);
root.querySelector(‘#k27-think-v’).textContent=(suppose*100).toFixed(0)+’%’;

var inRate=cache*R_CACHE+(1-cache)*R_MISS;
var inCost=runs*inp*inRate/1e6;
var outCost=runs*out*R_OUT/1e6;
var whole=inCost+outCost;

// illustrative 30% reasoning-token discount on the reasoning share of output
var reasonOut=out*suppose;
var saved=runs*(reasonOut*0.30)*R_OUT/1e6;

root.querySelector(‘#k27-r-in’).textContent=fmt(inCost);
root.querySelector(‘#k27-r-out’).textContent=fmt(outCost);
root.querySelector(‘#k27-r-total’).textContent=fmt(whole);
root.querySelector(‘#k27-r-big’).textContent=fmt(whole)+’ /mo’;
root.querySelector(‘#k27-r-save’).innerHTML=
‘≈ ‘+fmt(saved)+’/mo saved vs K2.6-style reasoning, from ~30% fewer reasoning tokens.’;
}
Object.keys(I).forEach(operate(okay){I[k].addEventListener(‘enter’,calc);});
calc();
})();