Liquid AI Releases LFM2.5-8B-A1B: An On-Device MoE Model With 8.3B Total and 1.5B Active Parameters
Liquid AI simply shipped LFM2.5-8B-A1B. It is an on-device Mixture-of-Experts (MoE) mannequin constructed for software calling. The mannequin holds 8.3B complete parameters however prompts solely 1.5B per token. That sparsity is what lets it run on shopper {hardware}.
The launch follows LFM2-8B-A1B, which Liquid AI workforce revealed earlier. LFM2.5 is a brand new household of hybrid fashions for on-device deployment. This model provides a 128K context window, reasoning, and scaled-up coaching.
What is LFM2.5-8B-A1B
The mannequin makes use of a sparse MoE design. It prompts 1.5B of 8.3B complete parameters per ahead move. That retains every generated token low-cost to compute.
The structure has 24 layers. Eighteen are double-gated LIV convolution blocks; six are GQA layers. It combines MoE, GQA, and gated brief convolution blocks. The context size is 131,072 tokens. The mannequin covers 9 languages, together with Arabic, Chinese, and Japanese.
Liquid AI workforce recommends a temperature of 0.2, top_k of 80, and repetition_penalty of 1.05.
Unlike its predecessor, LFM2.5-8B-A1B is a reasoning-only mannequin. It produces an specific chain of thought earlier than its ultimate reply. Liquid AI workforce selected this as a result of MoE fashions run in compute-bound settings. A smaller energetic parameter depend makes every reasoning token cheap.
What Changed Since LFM2-8B-A1B
Liquid expanded the context window from 32,768 to 128,000 tokens. Pretraining scaled from 12T to 38T tokens. The vocabulary doubled from 65,536 to 128,000 tokens.
The bigger vocabulary tokenizes non-Latin scripts extra effectively. Liquid AI workforce stories the strongest compression good points in Hindi, Thai, Vietnamese, Indonesian, and Arabic. The remainder of the structure stays the identical as LFM2-8B-A1B.
How Liquid AI Trained It
Liquid AI workforce prolonged the tokenizer in place slightly than retraining from scratch. It continued BPE merge coaching from the unique merges on a multilingual corpus. New embedding rows initialize because the imply of their sub-token decompositions. A short two-stage adaptation then recovers high quality.
Context extension got here in two phases. A 2T token midtraining part reached 32K, centered on reasoning, math, and software use. Raising the RoPE base θ, plus a 400B token stage, reached 128K.
Two reinforcement studying phases goal recognized failure modes. A choice optimization stage reduces ‘doom loops’ in lengthy reasoning traces. It redistributes likelihood mass towards believable options. A separate RL shaping reward discourages loop-inducing restart phrases like ‘Wait…’. Another RL stage makes use of an avg@k-based reward to chop hallucinations. The purpose is abstention on queries past dependable data.

The Benchmark Case
LFM2.5-8B-A1B improves over its predecessor throughout the board. The AA-Omniscience Non-Hallucination Rate jumped from 7.46 to 63.47. IFEval rose from 79.44 to 91.84. MATH500 climbed from 74.80 to 88.76. Tau² Telecom rose from 13.60 to 88.07.
Liquid AI workforce in contrast the mannequin towards dense and MoE options. On instruction following, it matches Gemma-4-26B-A4B-IT on IFEval. It does so at a fraction of the energetic parameter depend. On Tau² Telecom, it scores 88.07, forward of a lot bigger fashions.
The avg@ok reward drives a a lot decrease hallucination price. Accuracy stays affordable for the mannequin’s dimension. On agentic benchmarks, it stays aggressive with larger fashions.
| Benchmark | LFM2-8B-A1B | LFM2.5-8B-A1B | Δ |
|---|---|---|---|
| AA-Omniscience Non-Hallucination Rate | 7.46 | 63.47 | +56.01 |
| IFEval | 79.44 | 91.84 | +12.40 |
| MATH500 | 74.80 | 88.76 | +13.96 |
| Tau² Telecom | 13.60 | 88.07 | +74.47 |
Running It: CPU, GPU, and Tooling
The mannequin ships with day-one help throughout the inference ecosystem. Frameworks embody llama.cpp, MLX, vLLM, and SGLang. ONNX and Liquid’s LEAP edge platform are additionally supported.
On CPU, it decodes 253 tokens/s on an M5 Max. It reaches 146 tokens/s on a Ryzen AI Max+ 395. It stays below 6 GB of reminiscence all through. On a cellphone, it holds about 30 tokens/s.
On a single NVIDIA H100 SXM5, output throughput hits 18.5K tokens per second. That is over 1.6B tokens per day at excessive concurrency.
For software use, LFM2.5 writes Pythonic perform calls by default. They seem between the <|tool_call_start|> and <|tool_call_end|> particular tokens. You can override this to JSON within the system immediate.
Strengths and What to Watch
Strengths:
- Activates solely 1.5B parameters, conserving inference low-cost on edge {hardware}
- Competitive instruction-following and agentic scores for its dimension class
- 128K context window and nine-language protection
- Open-weight below the LFM1.0 license, with base and post-trained checkpoints
What to Watch:
- Limited data capability from the small energetic parameter depend
- Not a match for heavy programming or knowledge-intensive QA with out retrieval
- Reasoning-only output provides chain-of-thought tokens to each flip
- Text-only; this variant has no imaginative and prescient or audio enter
Marktechpost’s Visual Explainer
Key Takeaways
- Liquid AI's LFM2.5-8B-A1B holds 8.3B complete parameters however prompts solely 1.5B per token.
- It is reasoning-only, with a 128K context window and nine-language protection.
- Non-Hallucination Rate jumped from 7.46 to 63.47 over LFM2-8B-A1B; IFEval reached 91.84.
- It decodes 253 tok/s on an M5 Max below 6 GB, and ~30 tok/s on a cellphone.
- Day-one help spans llama.cpp, MLX, vLLM, and SGLang, with open base and post-trained weights.
Check out the Model Weights and Technical details. Also, be happy to observe us on Twitter and don’t overlook to affix our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to associate with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and many others.? Connect with us
The put up Liquid AI Releases LFM2.5-8B-A1B: An On-Device MoE Model With 8.3B Total and 1.5B Active Parameters appeared first on MarkTechPost.
