Baidu Releases ERNIE-4.5-21B-A3B-Thinking: A Compact MoE Model for Deep Reasoning

Baidu AI Research crew has simply launched ERNIE-4.5-21B-A3B-Thinking, a brand new reasoning-focused giant language mannequin designed round effectivity, long-context reasoning, and gear integration. Being a part of the ERNIE-4.5 household, this mannequin is a Mixture-of-Experts (MoE) structure with 21B whole parameters however solely 3B energetic parameters per token, making it computationally environment friendly whereas sustaining aggressive reasoning functionality. Released beneath the Apache-2.0 license, it’s accessible for each analysis and industrial deployment through Hugging Face.

What is the architectural design of ERNIE-4.5-21B-A3B-Thinking?

ERNIE-4.5-21B-A3B-Thinking is constructed on a Mixture-of-Experts spine. Instead of activating all 21B parameters, the router selects a subset of consultants, leading to 3B energetic parameters per token. This construction reduces computation with out compromising the specialization of various consultants. The analysis crew applies router orthogonalization loss and token-balanced loss to encourage numerous skilled activation and steady coaching.

This design gives a center floor between small dense fashions and ultra-large programs. The analysis crew’s assumptions embrace a principle that ~3B energetic parameters per token might signify a sensible candy spot for reasoning efficiency versus deployment effectivity.

How does the mannequin deal with long-context reasoning?

A defining functionality of ERNIE-4.5-21B-A3B-Thinking is its 128K context size. This permits the mannequin to course of very lengthy paperwork, carry out prolonged multi-step reasoning, and combine structured knowledge sources comparable to educational papers or multi-file codebases.

The analysis crew achieves this by way of progressive scaling of Rotary Position Embeddings (RoPE)—step by step growing the frequency base from 10K as much as 500K throughout coaching. Additional optimizations, together with FlashMask consideration and memory-efficient scheduling, make these long-context operations computationally possible.

What coaching technique helps its reasoning?

The mannequin follows the multi-stage recipe outlined throughout the ERNIE-4.5 household:

Stage I – Text-only pretraining builds the core language spine, beginning with 8K context and increasing to 128K.
Stage II – Vision coaching is skipped for this text-only variant.
Stage III – Joint multimodal coaching just isn’t used right here, as A3B-Thinking is only textual.

Post-training focuses on reasoning duties. The analysis crew employs Supervised Fine-Tuning (SFT) throughout arithmetic, logic, coding, and science, adopted by Progressive Reinforcement Learning (PRL). Reinforcement phases start with logic, then lengthen to arithmetic and programming, and eventually to broader reasoning duties. This is enhanced by Unified Preference Optimization (UPO), which integrates choice studying with PPO to stabilize alignment and cut back reward hacking.

What function does instrument utilization play on this mannequin?

ERNIE-4.5-21B-A3B-Thinking helps structured instrument and performance calling, making it helpful for eventualities the place exterior computation or retrieval is required. Developers can combine it with vLLM, Transformers 4.54+, and FastDeploy. This tool-use functionality is especially suited for program synthesis, symbolic reasoning, and multi-agent workflows.

Built-in operate calling permits the mannequin to cause over lengthy contexts whereas dynamically invoking exterior APIs, a key requirement for utilized reasoning in enterprise programs.

How does ERNIE-4.5-21B-A3B-Thinking carry out on reasoning benchmarks?

It present robust efficiency enhancements throughout logical reasoning, arithmetic, scientific QA, and programming duties. In evaluations, the mannequin demonstrates:

Enhanced accuracy in multi-step reasoning datasets, the place lengthy chains of thought are required.
Competitiveness with bigger dense fashions on STEM reasoning duties.
Stable textual content technology and educational synthesis efficiency, benefiting from prolonged context coaching.

These outcomes recommend that the MoE construction amplifies reasoning specialization, making it environment friendly with out requiring trillion-scale dense parameters.

https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking

How does it examine to different reasoning-focused LLMs?

This launch will get into the panorama that features OpenAI’s o3, Anthropic’s Claude 4, DeepSearch-R1, and Qwen-3. Many of those opponents depend on dense architectures or bigger energetic parameter counts. Baidu analysis crew’s alternative of a compact MoE with 3B energetic parameters presents a distinct steadiness:

Scalability: Sparse activation reduces compute overhead whereas scaling skilled capability.
Long-context readiness: 128K context is instantly educated, not retrofitted.
Commercial openness: Apache-2.0 license lowers adoption friction for enterprises.

Summary

ERNIE-4.5-21B-A3B-Thinking explains how deep reasoning may be achieved with out huge dense parameter counts. By combining environment friendly MoE routing, 128K context coaching, and gear integration, Baidu’s analysis crew presents a mannequin that balances research-grade reasoning with deployment feasibility.

Check out the Model on Hugging Face and PAPER. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to comply with us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Newsletter.

The publish Baidu Releases ERNIE-4.5-21B-A3B-Thinking: A Compact MoE Model for Deep Reasoning appeared first on MarkTechPost.

Baidu Releases ERNIE-4.5-21B-A3B-Thinking: A Compact MoE Model for Deep Reasoning

What is the architectural design of ERNIE-4.5-21B-A3B-Thinking?

How does the mannequin deal with long-context reasoning?

What coaching technique helps its reasoning?

What function does instrument utilization play on this mannequin?

How does ERNIE-4.5-21B-A3B-Thinking carry out on reasoning benchmarks?

How does it examine to different reasoning-focused LLMs?

Summary

OMEGA: A Structured Math Benchmark to Probe the Reasoning Limits of LLMs

Apple Researchers Release CLaRa: A Continuous Latent Reasoning Framework for Compression‑Native RAG with 16x–128x Semantic Document Compression

ByteDance Introduces Seed-Prover: An Advanced Formal Reasoning System for Automated Mathematical Theorem Proving

From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem

This AI Paper Introduces C3: A Bilingual Benchmark Dataset and Evaluation Framework for Complex Spoken Dialogue Modeling

Meta AI Open-Sources OpenZL: A Format-Aware Compression Framework with a Universal Decoder

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

What is the architectural design of ERNIE-4.5-21B-A3B-Thinking?

How does the mannequin deal with long-context reasoning?

What coaching technique helps its reasoning?

What function does instrument utilization play on this mannequin?

How does ERNIE-4.5-21B-A3B-Thinking carry out on reasoning benchmarks?

How does it examine to different reasoning-focused LLMs?

Summary

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!