|

DeepSeek Researchers Introduce DeepSeek-V3.2 and DeepSeek-V3.2-Speciale for Long Context Reasoning and Agentic Workloads

How do you get GPT-5-level reasoning on actual long-context, tool-using workloads with out paying the quadratic consideration and GPU value that often makes these methods impractical? DeepSeek analysis introduces DeepSeek-V3.2 and DeepSeek-V3.2-Speciale. They are reasoning-first fashions constructed for brokers and targets top quality reasoning, lengthy context and agent workflows, with open weights and manufacturing APIs. The fashions mix DeepSeek Sparse Attention (DSA), a scaled GRPO reinforcement studying stack and an agent native software protocol, and report efficiency akin to GPT 5, with DeepSeek-V3.2-Speciale reaching Gemini 3.0 Pro degree reasoning on public benchmarks and competitions.

https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/foremost/property/paper.pdf

Sparse Attention with Near Linear Long Context Cost

Both DeepSeek-V3.2 and DeepSeek-V3.2-Speciale use the DeepSeek-V3 Mixture of Experts transformer with about 671B whole parameters and 37B lively parameters per token, inherited from V3.1 Terminus. The solely structural change is DeepSeek Sparse Attention, launched via continued pre-training.

DeepSeek Sparse Attention splits consideration into 2 parts. A lightning indexer runs a small variety of low precision heads over all token pairs and produces relevance scores. A nice grained selector retains the top-k-key worth positions per question, and the principle consideration path runs Multi-Query-Attention and Multi-Head-Latent-Attention on this sparse set.

This adjustments the dominant complexity from O(L²) to O(kL), the place L is sequence size and ok is the variety of chosen tokens and a lot smaller than L. Based on the benchmarks, DeepSeek-V3.2 matches the dense Terminus baseline on accuracy whereas decreasing lengthy context inference value by about 50 p.c, with sooner throughput and decrease reminiscence use on H800 class {hardware} and on vLLM and SGLang backends.

https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/foremost/property/paper.pdf

Continued Pre Training for DeepSeek Sparse Attention

DeepSeek Sparse Attention (DSA) is launched by continued pre-training on prime of DeepSeek-V3.2 Terminus. In the dense heat up stage, dense consideration stays lively, all spine parameters are frozen and solely the lightning indexer is educated with a Kullback Leibler loss to match the dense consideration distribution on 128K context sequences. This stage makes use of a small variety of steps and about 2B tokens, sufficient for the indexer to study helpful scores.

In the sparse stage, the selector retains 2048 key-value entries per question, the spine is unfrozen and the mannequin continues coaching on about 944B tokens. Gradients for the indexer nonetheless come solely from the alignment loss with dense consideration on the chosen positions. This schedule makes DeepSeek Sparse Attention (DSA) behave as a drop in alternative for dense consideration with comparable high quality and decrease lengthy context value.

https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/foremost/property/paper.pdf

GRPO with greater than 10 Percent RL Compute

On prime of the sparse structure, DeepSeek-V3.2 makes use of Group Relative Policy Optimization (GRPO) as the principle reinforcement studying technique. The analysis group state that submit coaching reinforcement studying RL compute exceeds 10 p.c of pre coaching compute.

RL is organized round specialist domains. The analysis group trains devoted runs for arithmetic, aggressive programming, basic logical reasoning, searching and agent duties and security, then distills these specialists into the shared 685B parameter base for DeepSeek-V3.2 and DeepSeek-V3.2-Speciale. GRPO is applied with an unbiased KL estimator, off coverage sequence masking and mechanisms that hold Mixture of Experts (MoE) routing and sampling masks constant between coaching and sampling.

https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/foremost/property/paper.pdf

Agent Data, Thinking Mode and Tool Protocol

DeepSeek analysis group builds a big artificial agent dataset by producing greater than 1,800 environments and greater than 85,000 duties throughout code brokers, search brokers, basic instruments and code interpreter setups. Tasks are constructed to be onerous to unravel and straightforward to confirm, and are used as RL targets along with actual coding and search traces.

At inference time, DeepSeek-V3.2 introduces specific considering and non considering modes. The deepseek-reasoner endpoint exposes considering mode by default, the place the mannequin produces an inside chain of thought earlier than the ultimate reply. The considering with instruments information describes how reasoning content material is stored throughout software calls and cleared when a brand new person message arrives, and how software calls and software outcomes keep within the context even when reasoning textual content is trimmed for price range.

The chat template is up to date round this conduct. The DeepSeek-V3.2 Speciale repository ships Python encoder and decoder helpers as a substitute of a Jinja template. Messages can carry a reasoning_content discipline alongside content material, managed by a considering parameter. A developer function is reserved for search brokers and will not be accepted on the whole chat flows by the official API, which protects this channel from unintentional misuse.

https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/foremost/property/paper.pdf

Benchmarks, Competitions And Open Artifacts

On commonplace reasoning and coding benchmarks, DeepSeek-V3.2 and particularly DeepSeek-V3.2 Speciale are reported as akin to GPT-5 and near Gemini-3.0 Pro on suites reminiscent of AIME 2025, HMMT 2025, GPQA and LiveCodeBench, with improved value effectivity on lengthy context workloads.

For formal competitions, DeepSeek analysis group states that DeepSeek-V3.2 Speciale achieves gold medal degree efficiency on the International Mathematical Olympiad 2025, the Chinese Mathematical Olympiad 2025 and the International Olympiad in Informatics 2025, and aggressive gold medal degree efficiency on the ICPC World Finals 2025.

Key Takeaways

  1. DeepSeek-V3.2 provides DeepSeek Sparse Attention, which brings close to linear O(kL) consideration value and delivers round 50% decrease lengthy context API value in comparison with earlier dense DeepSeek fashions, whereas preserving high quality just like DeepSeek-V3.1 Terminus.
  2. The mannequin household retains the 671B parameter MoE spine with 37B lively parameters per token and exposes a full 128K context window in manufacturing APIs, which makes lengthy paperwork, multi step chains and giant software traces sensible fairly than a lab solely function.
  3. Post coaching makes use of Group Relative Policy Optimization (GRPO) with a compute price range that’s greater than 10 p.c of pre-training, centered on math, code, basic reasoning, searching or agent workloads and security, together with contest fashion specialists whose instances are launched for exterior verification.
  4. DeepSeek-V3.2 is the primary mannequin within the DeepSeek household to combine considering instantly into software use, supporting each considering and non considering software modes and a protocol the place inside reasoning persists throughout software calls and is reset solely on new person messages.

Check out the Paper and Model weights. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to comply with us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The submit DeepSeek Researchers Introduce DeepSeek-V3.2 and DeepSeek-V3.2-Speciale for Long Context Reasoning and Agentic Workloads appeared first on MarkTechPost.

Similar Posts