|

Liquid AI Released LFM2.5-350M: A Compact 350M Parameter Model Trained on 28T Tokens with Scaled Reinforcement Learning

In the present panorama of generative AI, the ‘scaling legal guidelines’ have typically dictated that extra parameters equal extra intelligence. However, Liquid AI is difficult this conference with the discharge of LFM2.5-350M. This mannequin is definitely a technical case research in intelligence density with extra pre-training (from 10T to 28T tokens) and large-scale reinforcement studying

The significance of LFM2.5-350M lies in its structure and coaching effectivity. While essentially the most AI firms has been centered on frontier fashions, Liquid AI is concentrating on the ‘edge’—gadgets with restricted reminiscence and compute—by proving {that a} 350-million parameter mannequin can outperform fashions greater than twice its dimension on a number of evaluated benchmarks.

https://www.liquid.ai/weblog/lfm2-5-350m-no-size-left-behind

Architecture: The Hybrid LIV Backbone

The core technical differentiator of the LFM2.5-350M is its departure from the pure Transformer structure. It makes use of a hybrid construction constructed on Linear Input-Varying Systems (LIVs).

Traditional Transformers rely fully on self-attention mechanisms, which endure from quadratic scaling points: because the context window grows, the reminiscence and computational necessities for the Key-Value (KV) cache improve. Liquid AI addresses this through the use of a hybrid spine consisting of:

  • 10 Double-Gated LIV Convolution Blocks: These deal with nearly all of the sequence processing. LIVs operate equally to superior Recurrent Neural Networks (RNNs) however are designed to be extra parallelizable and secure throughout coaching. They preserve a constant-state reminiscence, decreasing the I/O overhead.
  • 6 Grouped Query Attention (GQA) Blocks: By integrating a small variety of consideration blocks, the mannequin retains high-precision retrieval and long-range context dealing with with out the total reminiscence overhead of an ordinary Transformer.

This hybrid method permits the LFM2.5-350M to assist a 32k context window (32,768 tokens) whereas sustaining an especially lean reminiscence footprint.

Performance and Intelligence Density

The LFM2.5-350M was pre-trained on 28 trillion tokens with an especially excessive training-to-parameter ratio. This ensures that the mannequin’s restricted parameter depend is utilized to its most potential, leading to excessive ‘intelligence density.’

Benchmarks and Use Cases

The LFM2.5-350M is a specialist mannequin designed for high-speed, agentic duties slightly than general-purpose reasoning.

Benchmark Score
IFEval (Instruction Following) 76.96
GPQA Diamond 30.64
MMLU-Pro 20.01

The excessive IFEval rating signifies the mannequin is environment friendly at following advanced, structured directions, making it appropriate for software use, operate calling, and structured knowledge extraction (e.g., JSON). However, the documentation explicitly states that LFM2.5-350M is just not really helpful for arithmetic, advanced coding, or artistic writing. For these duties, the reasoning capabilities of bigger parameter counts stay obligatory.

https://www.liquid.ai/weblog/lfm2-5-350m-no-size-left-behind

Hardware Optimization and Inference Efficiency

A main hurdle for AI devs is the ‘reminiscence wall’—the bottleneck created by shifting knowledge between the processor and reminiscence. Because the LFM2.5-350M makes use of LIVs and GQA, it drastically reduces KV cache dimension, boosting throughput. On a single NVIDIA H100 GPU, the mannequin can attain a throughput of 40.4K output tokens per second at excessive concurrency.

Liquid AI workforce stories device-specific low-memory inference outcomes that make native deployment viable:

  • Snapdragon 8 Elite NPU: 169MB peak reminiscence utilizing RunAnywhere This autumn.
  • Snapdragon GPU: 81MB peak reminiscence utilizing RunAnywhere This autumn.
  • Raspberry Pi 5: 300MB utilizing Cactus Engine int8.

Key Takeaways

  • Extreme Intelligence Density: By coaching a 350M parameter mannequin on 28 trillion tokens, Liquid AI workforce achieved an tremendous excessive 80,000:1 token-to-parameter ratio, permitting it to outperform fashions greater than twice its dimension on a number of benchmarks.
  • Hybrid LIV Architecture: The mannequin departs from pure Transformers through the use of Linear Input-Varying Systems (LIVs) mixed with a small variety of Grouped Query Attention (GQA) blocks, considerably decreasing the reminiscence overhead of the KV cache.
  • Edge-First Efficiency: It is designed for native deployment with a 32k context window and a remarkably low reminiscence footprint—reaching as little as 81MB on cellular GPUs and 169MB on NPUs through specialised inference engines.
  • Specialized Agentic Capability: The mannequin is extremely optimized for instruction following (IFEval: 76.96) and power use, although it’s explicitly not really helpful for advanced coding, arithmetic, or artistic writing.
  • Massive Throughput: The architectural effectivity allows high-speed utility, processing as much as 40.4K output tokens per second on a single H100, making it perfect for high-volume knowledge extraction and real-time classification.

Check out the Technical details and Model WeightAlso, be at liberty to observe us on Twitter and don’t neglect to affix our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The publish Liquid AI Released LFM2.5-350M: A Compact 350M Parameter Model Trained on 28T Tokens with Scaled Reinforcement Learning appeared first on MarkTechPost.

Similar Posts