Allen Institute for AI (AI2) Introduces Olmo 3: An Open Source 7B and 32B LLM Family Built on the Dolma 3 and Dolci Stack
Allen Institute for AI (AI2) is releasing Olmo 3 as a totally open mannequin household that exposes the whole ‘mannequin move’, from uncooked information and code to intermediate checkpoints and deployment prepared variants.
Olmo 3 is a dense transformer suite with 7B and 32B parameter fashions. The household consists of Olmo 3-Base, Olmo 3-Think, Olmo 3-Instruct, and Olmo 3-RL Zero. Both 7B and 32B variants share a context size of 65,536 tokens and use the similar staged coaching recipe.

Dolma 3 Data Suite
At the core of the coaching pipeline is Dolma 3, a brand new information assortment designed for Olmo 3. Dolma 3 consists of Dolma 3 Mix, Dolma 3 Dolmino Mix, and Dolma 3 Longmino Mix. Dolma 3 Mix is a 5.9T token pre coaching dataset with net textual content, scientific PDFs, code repositories, and different pure information. The Dolmino and Longmino subsets are constructed from filtered, larger high quality slices of this pool.
Dolma 3 Mix helps the important pre coaching stage for Olmo 3-Base. AI2 analysis group then applies Dolma 3 Dolmino Mix, a 100B token mid coaching set that emphasizes math, code, instruction following, studying comprehension, and pondering oriented duties. Finally, Dolma 3 Longmino Mix provides 50B tokens for the 7B mannequin and 100B tokens for the 32B mannequin, with a powerful focus on lengthy paperwork and scientific PDFs processed with the olmOCR pipeline. This staged curriculum is what pushes the context restrict to 65,536 tokens whereas sustaining stability and high quality.
Large Scale Training on H100 Clusters
Olmo 3-Base 7B trains on Dolma 3 Mix utilizing 1,024 H100 gadgets, reaching about 7,700 tokens per system per second. Later levels use 128 H100s for Dolmino mid coaching and 256 H100s for Longmino lengthy context extension.
Base Model Performance Against Open Families
On normal functionality benchmarks, Olmo 3-Base 32B is positioned as a number one absolutely open base mannequin. AI2 analysis group reviews that it’s aggressive with outstanding open weight households similar to Qwen 2.5 and Gemma 3 at related sizes. Compared throughout a large suite of duties, Olmo 3-Base 32B ranks close to or above these fashions whereas protecting the full information and coaching configuration open for inspection and reuse.
Reasoning Focused Olmo 3 Think
Olmo 3-Think 7B and Olmo 3-Think 32B sit on prime of the base fashions as reasoning centered variants. They use a 3 stage submit coaching recipe that features supervised fantastic tuning, Direct Preference Optimization, and Reinforcement Learning with Verifiable Rewards inside the OlmoRL framework. Olmo 3-Think 32B is described as the strongest absolutely open reasoning mannequin and it narrows the hole to Qwen 3 32B pondering fashions whereas utilizing about six occasions fewer coaching tokens.

Olmo 3 Instruct for Chat and Tool Use
Olmo 3-Instruct 7B is tuned for quick instruction following, multi flip chat, and instrument use. It begins from Olmo 3-Base 7B and applies a separate Dolci Instruct information and coaching pipeline that covers supervised fantastic tuning, DPO, and RLVR for conversational and perform calling workloads. AI2 analysis group reviews that Olmo 3-Instruct matches or outperforms open weight opponents similar to Qwen 2.5, Gemma 3, and Llama 3.1 and is aggressive with Qwen 3 households at related scales for a number of instruction and reasoning benchmarks.
RL Zero for Clean RL Research
Olmo 3-RL Zero 7B is designed for researchers who care about reinforcement studying on language fashions however want clear separation between pre coaching information and RL information. It is constructed as a totally open RL pathway on prime of Olmo 3-Base and makes use of Dolci RL Zero datasets which can be decontaminated with respect to Dolma 3.
Comparison Table
| Model variant | Training or submit coaching information | Primary use case | Reported place vs different open fashions |
|---|---|---|---|
| Olmo 3 Base 7B | Dolma 3 Mix pre coaching, Dolma 3 Dolmino Mix mid coaching, Dolma 3 Longmino Mix lengthy context | General basis mannequin, lengthy context reasoning, code, math | Strong absolutely open 7B base, designed as basis for Think, Instruct, RL Zero, evaluated towards main open 7B scale bases |
| Olmo 3 Base 32B | Same Dolma 3 staged pipeline as 7B, with 100B Longmino tokens for lengthy context | High finish base for analysis, lengthy context workloads, RL setups | Described as the finest absolutely open 32B base, akin to Qwen 2.5 32B and Gemma 3 27B and outperforming Marin, Apertus, LLM360 |
| Olmo 3 Think 7B | Olmo 3 Base 7B, plus Dolci Think SFT, Dolci Think DPO, Dolci Think RL in OlmoRL framework | Reasoning centered 7B mannequin with inner pondering traces | Fully open reasoning mannequin at environment friendly scale that allows chain of thought analysis and RL experiments on modest {hardware} |
| Olmo 3 Think 32B | Olmo 3 Base 32B, plus the similar Dolci Think SFT, DPO, RL pipeline | Flagship reasoning mannequin with lengthy pondering traces | Stated as the strongest absolutely open pondering mannequin, aggressive with Qwen 3 32B pondering fashions whereas coaching on about 6x fewer tokens |
| Olmo 3 Instruct 7B | Olmo 3 Base 7B, plus Dolci Instruct SFT, Dolci Instruct DPO, Dolci Instruct RL 7B | Instruction following, chat, perform calling, instrument use | Reported to outperform Qwen 2.5, Gemma 3, Llama 3 and to slender the hole to Qwen 3 households at related scale |
| Olmo 3 RL Zero 7B | Olmo 3 Base 7B, plus Dolci RLZero Math, Code, IF, Mix datasets, decontaminated from Dolma 3 | RLVR analysis on math, code, instruction following, combined duties | Introduced as a totally open RL pathway for benchmarking RLVR on prime of a base mannequin with absolutely open pre coaching information |
Key Takeaways
- End to finish clear pipeline: Olmo 3 exposes the full ‘mannequin move’ from Dolma 3 information development, by means of staged pre coaching and submit coaching, to launched checkpoints, analysis suites, and tooling, enabling absolutely reproducible LLM analysis and fantastic grained debugging.
- Dense 7B and 32B fashions with 65K context: The household covers 7B and 32B dense transformers, all with a 65,536 token context window, skilled by way of a 3 stage Dolma 3 curriculum, Dolma 3 Mix for important pre coaching, Dolma 3 Dolmino for mid coaching, and Dolma 3 Longmino for lengthy context extension.
- Strong open base and reasoning fashions: Olmo 3 Base 32B is positioned as a prime absolutely open base mannequin at its scale, aggressive with Qwen 2.5 and Gemma 3, whereas Olmo 3 Think 32B is described as the strongest absolutely open pondering mannequin and approaches Qwen 3 32B pondering fashions utilizing about 6 occasions fewer coaching tokens.
- Task tuned Instruct and RL Zero variants: Olmo 3 Instruct 7B targets instruction following, multi flip chat, and instrument use utilizing Dolci Instruct SFT, DPO, and RLVR information, and is reported to match or outperform Qwen 2.5, Gemma 3, and Llama 3.1 at related scale. Olmo 3 RL Zero 7B supplies a totally open RLVR pathway with Dolci RLZero datasets decontaminated from pre coaching information for math, code, instruction following, and normal chat.
Editorial Comments
Olmo 3 is an uncommon launch as a result of it operationalizes openness throughout the full stack, Dolma 3 information recipes, staged pre coaching, Dolci submit coaching, RLVR in OlmoRL, and analysis with OLMES and OlmoBaseEval. This reduces ambiguity round information high quality, lengthy context coaching, and reasoning oriented RL, and it creates a concrete baseline for extending Olmo 3 Base, Olmo 3 Think, Olmo 3 Instruct, and Olmo 3 RL Zero in managed experiments. Overall, Olmo 3 units a rigorous reference level for clear, analysis grade LLM pipelines.
Check out the Technical details. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to observe us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The submit (*3*) appeared first on MarkTechPost.
