Allen Institute for AI (AI2) Introduces Olmo 3: An Open Source 7B and 32B LLM Family Built on the Dolma 3 and Dolci Stack

Allen Institute for AI (AI2) is releasing Olmo 3 as a totally open mannequin household that exposes the whole ‘mannequin move’, from uncooked information and code to intermediate checkpoints and deployment prepared variants.

Olmo 3 is a dense transformer suite with 7B and 32B parameter fashions. The household consists of Olmo 3-Base, Olmo 3-Think, Olmo 3-Instruct, and Olmo 3-RL Zero. Both 7B and 32B variants share a context size of 65,536 tokens and use the similar staged coaching recipe.

Dolma 3 Data Suite

At the core of the coaching pipeline is Dolma 3, a brand new information assortment designed for Olmo 3. Dolma 3 consists of Dolma 3 Mix, Dolma 3 Dolmino Mix, and Dolma 3 Longmino Mix. Dolma 3 Mix is a 5.9T token pre coaching dataset with net textual content, scientific PDFs, code repositories, and different pure information. The Dolmino and Longmino subsets are constructed from filtered, larger high quality slices of this pool.

Dolma 3 Mix helps the important pre coaching stage for Olmo 3-Base. AI2 analysis group then applies Dolma 3 Dolmino Mix, a 100B token mid coaching set that emphasizes math, code, instruction following, studying comprehension, and pondering oriented duties. Finally, Dolma 3 Longmino Mix provides 50B tokens for the 7B mannequin and 100B tokens for the 32B mannequin, with a powerful focus on lengthy paperwork and scientific PDFs processed with the olmOCR pipeline. This staged curriculum is what pushes the context restrict to 65,536 tokens whereas sustaining stability and high quality.

Large Scale Training on H100 Clusters

Olmo 3-Base 7B trains on Dolma 3 Mix utilizing 1,024 H100 gadgets, reaching about 7,700 tokens per system per second. Later levels use 128 H100s for Dolmino mid coaching and 256 H100s for Longmino lengthy context extension.

Base Model Performance Against Open Families

On normal functionality benchmarks, Olmo 3-Base 32B is positioned as a number one absolutely open base mannequin. AI2 analysis group reviews that it’s aggressive with outstanding open weight households similar to Qwen 2.5 and Gemma 3 at related sizes. Compared throughout a large suite of duties, Olmo 3-Base 32B ranks close to or above these fashions whereas protecting the full information and coaching configuration open for inspection and reuse.

Reasoning Focused Olmo 3 Think

Olmo 3-Think 7B and Olmo 3-Think 32B sit on prime of the base fashions as reasoning centered variants. They use a 3 stage submit coaching recipe that features supervised fantastic tuning, Direct Preference Optimization, and Reinforcement Learning with Verifiable Rewards inside the OlmoRL framework. Olmo 3-Think 32B is described as the strongest absolutely open reasoning mannequin and it narrows the hole to Qwen 3 32B pondering fashions whereas utilizing about six occasions fewer coaching tokens.

Olmo 3 Instruct for Chat and Tool Use

Olmo 3-Instruct 7B is tuned for quick instruction following, multi flip chat, and instrument use. It begins from Olmo 3-Base 7B and applies a separate Dolci Instruct information and coaching pipeline that covers supervised fantastic tuning, DPO, and RLVR for conversational and perform calling workloads. AI2 analysis group reviews that Olmo 3-Instruct matches or outperforms open weight opponents similar to Qwen 2.5, Gemma 3, and Llama 3.1 and is aggressive with Qwen 3 households at related scales for a number of instruction and reasoning benchmarks.

RL Zero for Clean RL Research

Olmo 3-RL Zero 7B is designed for researchers who care about reinforcement studying on language fashions however want clear separation between pre coaching information and RL information. It is constructed as a totally open RL pathway on prime of Olmo 3-Base and makes use of Dolci RL Zero datasets which can be decontaminated with respect to Dolma 3.

Comparison Table

Model variant	Training or submit coaching information	Primary use case	Reported place vs different open fashions
Olmo 3 Base 7B	Dolma 3 Mix pre coaching, Dolma 3 Dolmino Mix mid coaching, Dolma 3 Longmino Mix lengthy context	General basis mannequin, lengthy context reasoning, code, math	Strong absolutely open 7B base, designed as basis for Think, Instruct, RL Zero, evaluated towards main open 7B scale bases
Olmo 3 Base 32B	Same Dolma 3 staged pipeline as 7B, with 100B Longmino tokens for lengthy context	High finish base for analysis, lengthy context workloads, RL setups	Described as the finest absolutely open 32B base, akin to Qwen 2.5 32B and Gemma 3 27B and outperforming Marin, Apertus, LLM360
Olmo 3 Think 7B	Olmo 3 Base 7B, plus Dolci Think SFT, Dolci Think DPO, Dolci Think RL in OlmoRL framework	Reasoning centered 7B mannequin with inner pondering traces	Fully open reasoning mannequin at environment friendly scale that allows chain of thought analysis and RL experiments on modest {hardware}
Olmo 3 Think 32B	Olmo 3 Base 32B, plus the similar Dolci Think SFT, DPO, RL pipeline	Flagship reasoning mannequin with lengthy pondering traces	Stated as the strongest absolutely open pondering mannequin, aggressive with Qwen 3 32B pondering fashions whereas coaching on about 6x fewer tokens
Olmo 3 Instruct 7B	Olmo 3 Base 7B, plus Dolci Instruct SFT, Dolci Instruct DPO, Dolci Instruct RL 7B	Instruction following, chat, perform calling, instrument use	Reported to outperform Qwen 2.5, Gemma 3, Llama 3 and to slender the hole to Qwen 3 households at related scale
Olmo 3 RL Zero 7B	Olmo 3 Base 7B, plus Dolci RLZero Math, Code, IF, Mix datasets, decontaminated from Dolma 3	RLVR analysis on math, code, instruction following, combined duties	Introduced as a totally open RL pathway for benchmarking RLVR on prime of a base mannequin with absolutely open pre coaching information

Key Takeaways

End to finish clear pipeline: Olmo 3 exposes the full ‘mannequin move’ from Dolma 3 information development, by means of staged pre coaching and submit coaching, to launched checkpoints, analysis suites, and tooling, enabling absolutely reproducible LLM analysis and fantastic grained debugging.
Dense 7B and 32B fashions with 65K context: The household covers 7B and 32B dense transformers, all with a 65,536 token context window, skilled by way of a 3 stage Dolma 3 curriculum, Dolma 3 Mix for important pre coaching, Dolma 3 Dolmino for mid coaching, and Dolma 3 Longmino for lengthy context extension.
Strong open base and reasoning fashions: Olmo 3 Base 32B is positioned as a prime absolutely open base mannequin at its scale, aggressive with Qwen 2.5 and Gemma 3, whereas Olmo 3 Think 32B is described as the strongest absolutely open pondering mannequin and approaches Qwen 3 32B pondering fashions utilizing about 6 occasions fewer coaching tokens.
Task tuned Instruct and RL Zero variants: Olmo 3 Instruct 7B targets instruction following, multi flip chat, and instrument use utilizing Dolci Instruct SFT, DPO, and RLVR information, and is reported to match or outperform Qwen 2.5, Gemma 3, and Llama 3.1 at related scale. Olmo 3 RL Zero 7B supplies a totally open RLVR pathway with Dolci RLZero datasets decontaminated from pre coaching information for math, code, instruction following, and normal chat.

Editorial Comments

Olmo 3 is an uncommon launch as a result of it operationalizes openness throughout the full stack, Dolma 3 information recipes, staged pre coaching, Dolci submit coaching, RLVR in OlmoRL, and analysis with OLMES and OlmoBaseEval. This reduces ambiguity round information high quality, lengthy context coaching, and reasoning oriented RL, and it creates a concrete baseline for extending Olmo 3 Base, Olmo 3 Think, Olmo 3 Instruct, and Olmo 3 RL Zero in managed experiments. Overall, Olmo 3 units a rigorous reference level for clear, analysis grade LLM pipelines.

Check out the Technical details. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to observe us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The submit (*3*) appeared first on MarkTechPost.

Allen Institute for AI (AI2) Introduces Olmo 3: An Open Source 7B and 32B LLM Family Built on the Dolma 3 and Dolci Stack

Dolma 3 Data Suite

Large Scale Training on H100 Clusters

Base Model Performance Against Open Families

Reasoning Focused Olmo 3 Think

Olmo 3 Instruct for Chat and Tool Use

RL Zero for Clean RL Research

Comparison Table

Key Takeaways

Editorial Comments

Magistral: Mistral AI challenges big tech with reasoning model

SYNCOGEN: A Machine Learning Framework for Synthesizable 3D Molecular Generation Through Joint Graph and Coordinate Modeling

Microsoft AI Debuts MAI-Image-1: An In-House Text-to-Image Model that Enters LMArena’s Top-10

Major AI chatbots parrot CCP propaganda

Fractional Reasoning in LLMs: A New Way to Control Inference Depth

An Implementation on Building Advanced Multi-Endpoint Machine Learning APIs with LitServe: Batching, Streaming, Caching, and Local Inference

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Dolma 3 Data Suite

Large Scale Training on H100 Clusters

Base Model Performance Against Open Families

Reasoning Focused Olmo 3 Think

Olmo 3 Instruct for Chat and Tool Use

RL Zero for Clean RL Research

Comparison Table

Key Takeaways

Editorial Comments

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!