NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon
Pretraining frontier-scale LLMs in FP8 is now customary follow, however transferring to 4-bit floating level has remained an open analysis drawback as a result of narrower codecs compress dynamic vary and amplify quantization error at lengthy token horizons. A brand new analysis from NVIDIA describes a pretraining methodology constructed round NVFP4, a 4-bit microscaling format…
