Alibaba Qwen Team Releases Qwen3.6-27B: A Dense Open-Weight Model Outperforming 397B MoE on Agentic Coding Benchmarks

Alibaba’s Qwen Team has launched Qwen3.6-27B, the primary dense open-weight mannequin within the Qwen3.6 household — and arguably probably the most succesful 27-billion-parameter mannequin out there right now for coding brokers. It brings substantial enhancements in agentic coding, a novel Thinking Preservation mechanism, and a hybrid structure that blends Gated DeltaNet linear consideration with conventional self-attention — all underneath an Apache 2.0 license.

The launch comes weeks after the Qwen3.6-35B-A3B, a sparse Mixture-of-Experts (MoE) mannequin with solely 3B energetic parameters which itself adopted the broader Qwen3.5 collection. Qwen3.6-27B is the household’s second mannequin and the primary absolutely dense variant — and on a number of key benchmarks, it really outperforms each Qwen3.6-35B-A3B and the a lot bigger Qwen3.5-397B-A17B MoE mannequin. The Qwen crew describes the discharge as prioritizing “stability and real-world utility,” formed by direct neighborhood suggestions slightly than benchmark optimization.

The Qwen crew releases two weight variants on Hugging Face Hub: Qwen/Qwen3.6-27B in BF16 and Qwen/Qwen3.6-27B-FP8, a quantized model utilizing fine-grained FP8 quantization with a block dimension of 128, with efficiency metrics practically an identical to the unique mannequin. Both variants are appropriate with SGLang (>=0.5.10), vLLM (>=0.19.0), OkTransformers, and Hugging Face Transformers.

What’s New: Two Key Features

Agentic Coding is the primary main improve. The mannequin has been particularly optimized to deal with frontend workflows and repository-level reasoning — duties that require understanding a big codebase, navigating file constructions, modifying throughout a number of recordsdata, and producing constant, runnable output. On QwenWebBench, an inside bilingual (EN/CN) front-end code technology benchmark spanning seven classes — Web Design, Web Apps, Games, SVG, Data Visualization, Animation, and 3D — Qwen3.6-27B scores 1487, a major soar from 1068 for Qwen3.5-27B and 1397 for Qwen3.6-35B-A3B. On NL2Repo, which checks repository-level code technology, the mannequin scores 36.2 versus 27.3 for Qwen3.5-27B. On SWE-bench Verified — the neighborhood customary for autonomous software program engineering brokers — it reaches 77.2, up from 75.0, and aggressive with Claude 4.5 Opus’s 80.9.

Thinking Preservation is the second, and arguably extra architecturally fascinating, addition. By default, most LLMs solely retain the chain-of-thought (CoT) reasoning generated for the present person message; reasoning from earlier turns is discarded. Qwen3.6 introduces a brand new possibility — enabled by way of "chat_template_kwargs": {"preserve_thinking": True} within the API — to retain and leverage considering traces from historic messages throughout the whole dialog. For iterative agent workflows, that is virtually vital: the mannequin carries ahead earlier reasoning context slightly than re-deriving it every flip. This can cut back total token consumption by minimizing redundant reasoning and in addition enhance KV cache utilization.

Under the Hood: A Hybrid Architecture

Qwen3.6-27B is a Causal Language Model with a Vision Encoder. It is natively multimodal, supporting textual content, picture, and video inputs — educated by each pre-training and post-training phases.

The mannequin has 27B parameters distributed throughout 64 layers, with a hidden dimension of 5120 and a token embedding house of 248,320 (padded). The hidden format follows a particular repeating sample: 16 blocks, every structured as 3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN). This means three out of each 4 sublayers use Gated DeltaNet — a type of linear consideration — with solely each fourth sublayer utilizing customary Gated Attention.

What is Gated DeltaNet? Traditional self-attention computes relationships between each token pair, which scales quadratically (O(n²)) with sequence size — costly for lengthy contexts. Linear consideration mechanisms like DeltaNet approximate this with linear complexity (O(n)), making them considerably sooner and extra memory-efficient. Gated DeltaNet provides a gating mechanism on high, primarily studying when to replace or retain data, comparable in spirit to LSTM gating however utilized to the eye computation. In Qwen3.6-27B, Gated DeltaNet sublayers use 48 linear consideration heads for values (V) and 16 for queries and keys (QK), with a head dimension of 128.

The Gated Attention sublayers use 24 consideration heads for queries (Q) and solely 4 for keys and values (KV) — a configuration that considerably reduces KV cache reminiscence at inference time. These layers have a head dimension of 256 and use Rotary Position Embedding (RoPE) with a rotation dimension of 64. The FFN intermediate dimension is 17,408.

The mannequin additionally makes use of Multi-Token Prediction (MTP), educated with multi-steps. At inference time, this allows speculative decoding — the place the mannequin generates a number of candidate tokens concurrently and verifies them in parallel — bettering throughput with out compromising high quality.

Context Window: 262K Native, 1M with YaRN

Natively, Qwen3.6-27B helps a context size of 262,144 tokens — sufficient to carry a big codebase or a book-length doc. For duties exceeding this, the mannequin helps YaRN (Yet one other RoPE extension) scaling, extensible as much as 1,010,000 tokens. The Qwen crew advises conserving context no less than 128K tokens to protect the mannequin’s considering capabilities.

Benchmark Performance

On agentic coding benchmarks, the beneficial properties over Qwen3.5-27B are substantial. SWE-bench Pro scores 53.5 versus 51.2 for Qwen3.5-27B and 50.9 for the a lot bigger Qwen3.5-397B-A17B — which means the 27B dense mannequin exceeds a 397B MoE on this job. SWE-bench Multilingual scores 71.3 versus 69.3 for Qwen3.5-27B. Terminal-Bench 2.0, evaluated underneath a 3-hour timeout with 32 CPUs and 48 GB RAM, reaches 59.3 — matching Claude 4.5 Opus precisely, and outperforming Qwen3.6-35B-A3B (51.5). AbilitiesBench Avg5 reveals probably the most hanging acquire: 48.2 versus 27.2 for Qwen3.5-27B, a 77% relative enchancment, additionally properly above Qwen3.6-35B-A3B’s 28.7.

On reasoning benchmarks, GPQA Diamond reaches 87.8 (up from 85.5), AIME26 hits 94.1 (up from 92.6), and LiveCodeBench v6 scores 83.9 (up from 80.7).

Vision-language benchmarks present constant parity or enchancment over Qwen3.5-27B. VideoMME (with subtitles) reaches 87.7, AndroidWorld (visible agent benchmark) scores 70.3, and VlmsAreBlind — which probes for widespread visible understanding failure modes — scores 97.0.

Key Takeaways

Qwen3.6-27B is Alibaba’s first dense open-weight mannequin within the Qwen3.6 household, constructed to prioritize real-world coding utility over benchmark efficiency — licensed underneath Apache 2.0.
The mannequin introduces Thinking Preservation, a brand new characteristic that retains reasoning traces throughout dialog historical past, decreasing redundant token technology and bettering KV cache effectivity in multi-turn agent workflows.
Agentic coding efficiency is the important thing power — Qwen3.6-27B scores 77.2 on SWE-bench Verified, 59.3 on Terminal-Bench 2.0 (matching Claude 4.5 Opus), and 1487 on QwenWebBench, outperforming each its predecessor Qwen3.5-27B and the bigger Qwen3.5-397B-A17B MoE mannequin on a number of duties.
The structure makes use of a hybrid Gated DeltaNet + Gated Attention format throughout 64 layers — three out of each 4 sublayers use environment friendly linear consideration (Gated DeltaNet), with Multi-Token Prediction (MTP) enabling speculative decoding at serving time.
Two weight variants can be found on Hugging Face Hub — Qwen3.6-27B (BF16) and Qwen3.6-27B-FP8 (fine-grained FP8 with block dimension 128) — each supporting SGLang, vLLM, OkTransformers, and Hugging Face Transformers, with a local 262,144-token context window extensible to 1,010,000 tokens by way of YaRN.

Check out the Technical details, Qwen/Qwen3.6-27B and Qwen/Qwen3.6-27B-FP8. Also, be happy to observe us on Twitter and don’t neglect to hitch our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and many others.? Connect with us

The put up Alibaba Qwen Team Releases Qwen3.6-27B: A Dense Open-Weight Model Outperforming 397B MoE on Agentic Coding Benchmarks appeared first on MarkTechPost.

Alibaba Qwen Team Releases Qwen3.6-27B: A Dense Open-Weight Model Outperforming 397B MoE on Agentic Coding Benchmarks

What’s New: Two Key Features

Under the Hood: A Hybrid Architecture

Context Window: 262K Native, 1M with YaRN

Benchmark Performance

Key Takeaways

Building Production-Ready Custom AI Agents for Enterprise Workflows with Monitoring, Orchestration, and Scalability

Meta buys Moltbook: The social network where AI agents talk to each other

Moonshot AI Releases Kimi K3: A 2.8 Trillion Parameter Open MoE Model With Kimi Delta Attention and 1M Context

Meet South Korea’s LLM Powerhouses: HyperClova, AX, Solar Pro, and More

Google Health AI Releases MedASR: a Conformer Based Medical Speech to Text Model for Clinical Dictation

Building Advanced Multi-Agent AI Workflows by Leveraging AutoGen and Semantic Kernel

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

What’s New: Two Key Features

Under the Hood: A Hybrid Architecture

Context Window: 262K Native, 1M with YaRN

Benchmark Performance

Key Takeaways

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!