NVIDIA AI Releases Nemotron-Labs-Diffusion: A Tri-Mode Language Model with 6× Tokens Per Forward Over Qwen3-8B
NVIDIA researchers have launched Nemotron-Labs-Diffusion, a language mannequin household that unifies three decoding modes in a single structure. The mannequin helps autoregressive (AR) decoding, diffusion-based parallel decoding, and self-speculation decoding. It is out there in 3B, 8B, and 14B parameter sizes. The household contains base, instruct, and vision-language variants. Sequential Decoding Limits Throughput Standard autoregressive…
