|

Qwen Team Open-Sources Qwen3.6-35B-A3B: A Sparse MoE Vision-Language Model with 3B Active Parameters and Agentic Coding Capabilities

The open-source AI panorama has a brand new entry price listening to. The Qwen crew at Alibaba has launched Qwen3.6-35B-A3B, the primary open-weight mannequin from the Qwen3.6 era, and it’s making a compelling argument that parameter effectivity issues way over uncooked mannequin dimension. With 35 billion complete parameters however solely 3 billion activated throughout inference, this mannequin delivers agentic coding efficiency aggressive with dense fashions which can be ten occasions its lively dimension.

What is a Sparse MoE Model, and Why Does it Matter Here?

A Mixture of Experts (MoE) mannequin doesn’t run all of its parameters on each ahead cross. Instead, the mannequin routes every enter token by means of a small subset of specialised sub-networks known as ‘consultants.’ The remainder of the parameters sit idle. This means you possibly can have an infinite complete parameter rely whereas maintaining inference compute — and due to this fact inference value and latency — proportional solely to the lively parameter rely.

Qwen3.6-35B-A3B is a Causal Language Model with Vision Encoder, skilled by means of each pre-training and post-training levels, with 35 billion complete parameters and 3 billion activated. Its MoE layer incorporates 256 consultants in complete, with 8 routed consultants and 1 shared knowledgeable activated per token.

The structure introduces an uncommon hidden format price understanding: the mannequin makes use of a sample of 10 blocks, every consisting of three situations of (Gated DeltaNet → MoE) adopted by 1 occasion of (Gated Attention → MoE). Across 40 complete layers, the Gated DeltaNet sublayers deal with linear consideration — a computationally cheaper different to straightforward self-attention — whereas the Gated Attention sublayers use Grouped Query Attention (GQA), with 16 consideration heads for Q and solely 2 for KV, considerably decreasing KV-cache reminiscence strain throughout inference. The mannequin helps a local context size of 262,144 tokens, extensible as much as 1,010,000 tokens utilizing YaRN (Yet one other RoPE extensioN) scaling.

Agentic Coding is Where This Model Gets Serious

On SWE-bench Verified — the canonical benchmark for real-world GitHub concern decision — Qwen3.6-35B-A3B scores 73.4, in comparison with 70.0 for Qwen3.5-35B-A3B and 52.0 for Gemma4-31B. On Terminal-Bench 2.0, which evaluates an agent finishing duties inside an actual terminal setting with a three-hour timeout, Qwen3.6-35B-A3B scores 51.5 — the best amongst all in contrast fashions, together with Qwen3.5-27B (41.6), Gemma4-31B (42.9), and Qwen3.5-35B-A3B (40.5).

Frontend code era exhibits the sharpest enchancment. On QwenWebBench, an inner bilingual front-end code era benchmark protecting seven classes together with Web Design, Web Apps, Games, SVG, Data Visualization, Animation, and 3D, Qwen3.6-35B-A3B achieves a rating of 1397 — nicely forward of Qwen3.5-27B (1068) and Qwen3.5-35B-A3B (978).

On STEM and reasoning benchmarks, the numbers are equally placing. Qwen3.6-35B-A3B scores 92.7 on AIME 2026 (the complete AIME I & II), and 86.0 on GPQA Diamond — a graduate-level scientific reasoning benchmark — each aggressive with a lot bigger fashions.

Multimodal Vision Performance

Qwen3.6-35B-A3B just isn’t a text-only mannequin. It ships with a imaginative and prescient encoder and handles picture, doc, video, and spatial reasoning duties natively.

On MMMU (Massive Multi-discipline Multimodal Understanding), a benchmark that checks university-level reasoning throughout pictures, Qwen3.6-35B-A3B scores 81.7, outperforming Claude-Sonnet-4.5 (79.6) and Gemma4-31B (80.4). On RealWorldQA, which checks visible understanding in real-world photographic contexts, the mannequin achieves 85.3, forward of Qwen3.5-27B (83.7) and considerably above Claude-Sonnet-4.5 (70.3) and Gemma 4-31B (72.3).

Spatial intelligence is one other space of measurable achieve. On ODInW13, an object detection benchmark, Qwen3.6-35B-A3B scores 50.8, up from 42.6 for Qwen3.5-35B-A3B. For video understanding, it achieves 83.7 on VideoMMMU, outperforming Claude-Sonnet-4.5 (77.6) and Gemma4-31B (81.6).

https://qwen.ai/weblog?id=qwen3.6-35b-a3b

Thinking Mode, Non-Thinking Mode, and a Key Behavioral Change

One of the extra virtually helpful design selections in Qwen3.6 is express management over the mannequin’s reasoning conduct. Qwen3.6 fashions function in considering mode by default, producing reasoning content material enclosed inside <assume> tags earlier than producing the ultimate response. Developers who want sooner, direct responses can disable this through an API parameter — setting "enable_thinking": False within the chat template kwargs. However, AI professionals migrating from Qwen3 ought to be aware an vital behavioral change: Qwen3.6 doesn’t formally assist the gentle change of Qwen3, i.e., /assume and /nothink. Mode switching have to be achieved by means of the API parameter slightly than inline immediate tokens.

The extra novel addition is a function known as Thinking Preservation. By default, solely the considering blocks generated for the most recent consumer message are retained; Qwen3.6 has been moreover skilled to protect and leverage considering traces from historic messages, which might be enabled by setting the preserve_thinking possibility. This functionality is especially useful for agent situations, the place sustaining full reasoning context can improve resolution consistency, cut back redundant reasoning, and enhance KV cache utilization in each considering and non-thinking modes.

Key Takeaways

  • Qwen3.6-35B-A3B is a sparse Mixture of Experts mannequin with 35 billion complete parameters however solely 3 billion activated at inference time, making it considerably cheaper to run than its complete parameter rely suggests — with out sacrificing efficiency on advanced duties.
  • The mannequin’s agentic coding capabilities are its strongest swimsuit, with a rating of 51.5 on Terminal-Bench 2.0 (the best amongst all in contrast fashions), 73.4 on SWE-bench Verified, and a dominant 1,397 on QwenWebBench protecting frontend code era throughout seven classes together with Web Apps, Games, and Data Visualization.
  • Qwen3.6-35B-A3B is a natively multimodal mannequin, supporting picture, video, and doc understanding out of the field, with scores of 81.7 on MMMU, 85.3 on RealWorldQA, and 83.7 on VideoMMMU — outperforming Claude-Sonnet-4.5 and Gemma4-31B on every of those.
  • The mannequin introduces a brand new Thinking Preservation function that enables reasoning traces from prior dialog turns to be retained and reused throughout multi-step agent workflows, decreasing redundant reasoning and enhancing KV cache effectivity in each considering and non-thinking modes.
  • Released underneath Apache 2.0, the mannequin is totally open for business use and is appropriate with the main open-source inference frameworks — SGLang, vLLM, OkayTransformers, and Hugging Face Transformers — with OkayTransformers particularly enabling CPU-GPU heterogeneous deployment for resource-constrained environments.

Check out the Technical details and Model WeightsAlso, be happy to comply with us on Twitter and don’t neglect to hitch our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to accomplice with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so forth.? Connect with us

The submit Qwen Team Open-Sources Qwen3.6-35B-A3B: A Sparse MoE Vision-Language Model with 3B Active Parameters and Agentic Coding Capabilities appeared first on MarkTechPost.

Similar Posts