Meta FAIR Released Code World Model (CWM): A 32-Billion-Parameter Open-Weights LLM, to Advance Research on Code Generation with World Models

ByRicardo September 25, 2025

Meta FAIR launched Code World Model (CWM), a 32-billion-parameter dense decoder-only LLM that injects world modeling into code era by coaching on execution traces and long-horizon agent–atmosphere interactions—not simply static supply textual content.

What’s new: studying code by predicting execution?

CWM mid-trains on two giant households of remark–motion trajectories: (1) Python interpreter traces that document native variable states after every executed line, and (2) agentic interactions inside Dockerized repositories that seize edits, shell instructions, and check suggestions. This grounding is meant to educate semantics (how state evolves) fairly than solely syntax.

To scale assortment, the analysis group constructed executable repository photographs from 1000’s of GitHub initiatives and foraged multi-step trajectories through a software-engineering agent (“ForagerAgent”). The launch experiences ~3M trajectories throughout ~10k photographs and three.15k repos, with mutate-fix and issue-fix variants.

https://ai.meta.com/analysis/publications/cwm-an-open-weights-llm-for-research-on-code-generation-with-world-models/

Model and context window

CWM is a dense, decoder-only Transformer (no MoE) with 64 layers, GQA (48Q/8KV), SwiGLU, RMSNorm, and Scaled RoPE. Attention alternates native 8k and world 131k sliding-window blocks, enabling 131k tokens efficient context; coaching makes use of document-causal masking.

Training recipe (pre → mid → put up)

General pretraining: 8T tokens (code-heavy) at 8k context.
Mid-training: +5T tokens, long-context (131k) with Python execution traces, ForagerAgent information, PR-derived diffs, IR/compilers, Triton kernels, and Lean math.
Post-training: 100B-token SFT for instruction + reasoning, then multi-task RL (~172B-token) throughout verifiable coding, math, and multi-turn SWE environments utilizing a GRPO-style algorithm and a minimal toolset (bash/edit/create/submit).
Quantized inference matches on a single 80 GB H100.

Benchmarks

The analysis group cites the next cross@1 / scores (test-time scaling famous the place relevant):

SWE-bench Verified: 65.8% (with test-time scaling).
ResideCodeBench-v5: 68.6%; LCB-v6: 63.5%.
Math-500: 96.6%; AIME-24: 76.0%; AIME-25: 68.2%.
CruxEval-Output: 94.3%.

The analysis group place CWM as aggressive with equally sized open-weights baselines and even with bigger or closed fashions on SWE-bench Verified.

For context on SWE-bench Verified’s activity design and metrics, see the official benchmark resources.

Why world modeling issues for code?

The launch emphasizes two operational capabilities:

Execution-trace prediction: given a operate and a hint begin, CWM predicts stack frames (locals) and the executed line at every step through a structured format—usable as a “neural debugger” for grounded reasoning with out reside execution.
Agentic coding: multi-turn reasoning with software use towards actual repos, verified by hidden checks and patch similarity rewards; the setup trains the mannequin to localize faults and generate end-to-end patches (git diff) fairly than snippets.

Some particulars price noting

Tokenizer: Llama-3 household with reserved management tokens; reserved IDs are used to demarcate hint and reasoning segments throughout SFT.
Attention structure: the 3:1 native:world interleave is repeated throughout the depth; long-context coaching happens at giant token batch sizes to stabilize gradients.
Compute scaling: learning-rate/batch dimension schedules are derived from inside scaling-law sweeps tailor-made for long-context overheads.

Summary

CWM is a practical step towards grounded code era: Meta ties a 32B dense transformer to execution-trace studying and agentic, test-verified patching, releases intermediate/post-trained checkpoints, and gates utilization beneath the FAIR Non-Commercial Research License—making it a helpful platform for reproducible ablations on long-context, execution-aware coding with out conflating analysis with manufacturing deployment.

Check out the Paper, GitHub Page, and Model on Hugging Face. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to observe us on Twitter and don’t overlook to be a part of our 100k+ ML SubReddit and Subscribe to our Newsletter.

The put up Meta FAIR Released Code World Model (CWM): A 32-Billion-Parameter Open-Weights LLM, to Advance Research on Code Generation with World Models appeared first on MarkTechPost.

AI Paper Summary AI Shorts

This AI Paper from Alibaba Introduces Lumos-1: A Unified Autoregressive Video Generator Leveraging MM-RoPE and AR-DF for Efficient Spatiotemporal Modeling
ByRicardo July 21, 2025

Autoregressive video generation is a rapidly evolving research domain. It focuses on the synthesis of videos frame-by-frame using learned patterns of both spatial arrangements and temporal dynamics. Unlike traditional video creation methods, which may rely on pre-built frames or handcrafted transitions, autoregressive models aim to generate content dynamically based on prior tokens. This approach is…

Read More This AI Paper from Alibaba Introduces Lumos-1: A Unified Autoregressive Video Generator Leveraging MM-RoPE and AR-DF for Efficient Spatiotemporal Modeling
AI Paper Summary AI Shorts

New AI Method From Meta and NYU Boosts LLM Alignment Using Semi-Online Reinforcement Learning
ByRicardo July 6, 2025

Optimizing LLMs for Human Alignment Using Reinforcement Learning Large language models often require a further alignment phase to optimize them for human use. In this phase, reinforcement learning plays a central role by enabling models to make decisions based on human feedback or task-based correctness. This fine-tuning allows for the models to align more closely…

Read More New AI Method From Meta and NYU Boosts LLM Alignment Using Semi-Online Reinforcement Learning
AI Paper Summary AI Shorts

Nested Learning: A New Machine Learning Approach for Continual Learning that Views Models as Nested Optimization Problems to Enhance Long Context Processing
ByRicardo November 8, 2025

How can we construct AI methods that continue learning new info over time with out forgetting what they discovered earlier than or retraining from scratch? Google Researchers has launched Nested Learning, a machine studying method that treats a mannequin as a set of smaller nested optimization issues, as an alternative of a single community skilled…

Read More Nested Learning: A New Machine Learning Approach for Continual Learning that Views Models as Nested Optimization Problems to Enhance Long Context Processing
AI Paper Summary AI Shorts

University of Michigan Researchers Propose G-ACT: A Scalable Machine Learning Framework to Steer Programming Language Bias in LLMs
ByRicardo June 30, 2025

LLMs and the Need for Scientific Code Control LLMs have rapidly evolved into complex natural language processors, enabling the development of agentic systems that manage complex workflows. However, the use of LLM agents for generating scientific code is unexplored. Scientific software primarily depends on C++, CUDA, and other low-level languages, which are underrepresented in most…

Read More University of Michigan Researchers Propose G-ACT: A Scalable Machine Learning Framework to Steer Programming Language Bias in LLMs
AI Paper Summary AI Shorts

Texas A&M Researchers Introduce a Two-Phase Machine Learning Method Named ‘ShockCast’ for High-Speed Flow Simulation with Neural Temporal Re-Meshing
ByRicardo June 22, 2025

Challenges in Simulating High-Speed Flows with Neural Solvers Modeling high-speed fluid flows, such as those in supersonic or hypersonic regimes, poses unique challenges due to the rapid changes associated with shock waves and expansion fans. Unlike low-speed flows, where fixed time steps work well, these fast-moving flows require adaptive time stepping to capture small-scale dynamics…

Read More Texas A&M Researchers Introduce a Two-Phase Machine Learning Method Named ‘ShockCast’ for High-Speed Flow Simulation with Neural Temporal Re-Meshing
AI Paper Summary AI Shorts

AREAL: Accelerating Large Reasoning Model Training with Fully Asynchronous Reinforcement Learning
ByRicardo June 18, 2025

Introduction: The Need for Efficient RL in LRMs Reinforcement Learning RL is increasingly used to enhance LLMs, especially for reasoning tasks. These models, known as Large Reasoning Models (LRMs), generate intermediate “thinking” steps before providing final answers, thereby improving performance on complex problems such as math and coding. However, training LRMs with RL at scale…

Read More AREAL: Accelerating Large Reasoning Model Training with Fully Asynchronous Reinforcement Learning

Meta FAIR Released Code World Model (CWM): A 32-Billion-Parameter Open-Weights LLM, to Advance Research on Code Generation with World Models