|

OpenAI Debuts GPT-5.1-Codex-Max, a Long-Horizon Agentic Coding Model With Compaction for Multi-Window Workflows

OpenAI has launched GPT-5.1-Codex-Max, a frontier agentic coding mannequin designed for lengthy working software program engineering duties that span tens of millions of tokens and multi hour periods. It is accessible right now inside Codex within the CLI, IDE extension, cloud integration and code assessment surfaces, with API entry deliberate quickly.

What GPT-5.1-Codex-Max is optimised for?

GPT-5.1-Codex-Max is constructed on an replace to OpenAI’s foundational reasoning mannequin. This base mannequin is skilled on agentic duties throughout software program engineering, math, analysis and different domains. On high of this, GPT-5.1-Codex-Max is skilled on actual world software program engineering workloads comparable to PR creation, code assessment, frontend coding and Q&A.

The mannequin targets frontier coding evaluations quite than basic chat. GPT-5.1-Codex-Max and the broader Codex household is really helpful solely for agentic coding duties in Codex or Codex like environments, not as a drop in substitute for GPT-5.1 generally goal conversations.

GPT-5.1-Codex-Max can be the primary Codex mannequin skilled to function in Windows environments. Its coaching consists of duties that make it a higher collaborator within the Codex CLI, together with improved behaviour when working instructions and dealing with information beneath the Codex sandbox.

Compaction and lengthy working duties

A core function of GPT-5.1-Codex-Max is compaction. The mannequin nonetheless runs inside a mounted context window, however it’s natively skilled to work throughout a number of context home windows by pruning its interplay historical past whereas preserving a very powerful info over lengthy horizons.

In Codex functions, GPT-5.1-Codex-Max robotically compacts its session when it approaches the context window restrict. It creates a contemporary context window that retains the important state of the duty, then continues execution. This course of repeats till the duty completes.

OpenAI stories inside evaluations the place GPT-5.1-Codex-Max works independently for greater than 24 hours on a single activity. During these runs, the mannequin iterates on its implementation, fixes failing checks and finally produces a profitable outcome.

https://openai.com/index/gpt-5-1-codex-max/

Reasoning effort, velocity and token effectivity

GPT-5.1-Codex-Max makes use of the identical reasoning effort management launched with GPT-5.1, however tuned for coding brokers. Reasoning effort selects what number of considering tokens the mannequin makes use of earlier than committing to a solution.

On SWE-bench Verified, GPT-5.1-Codex-Max with medium reasoning effort achieves larger accuracy than GPT-5.1-Codex on the identical effort whereas utilizing 30% fewer considering tokens. For non latency delicate duties, OpenAI introduces a new Extra High, written as xhigh, reasoning effort that lets the mannequin assume longer to achieve higher solutions. Medium stays the really helpful setting for most workloads.

These modifications present up in benchmark outcomes. With GPT-5.1-Codex evaluated at excessive reasoning effort and GPT-5.1-Codex-Max at xhigh, OpenAI stories the next scores on 500 problems with SWE-bench Verified, 73.7% for GPT-5.1-Codex and 77.9% for GPT-5.1-Codex-Max. On SWE-Lancer IC SWE, the scores are 66.3% and 79.9%. On Terminal-Bench 2.0, scores are 52.8% and 58.1%. All evaluations run with compaction enabled, and Terminal-Bench 2.0 makes use of the Codex CLI contained in the Laude Institute Harbor harness.

In qualitative checks, GPT-5.1-Codex-Max generates top quality frontend designs with related performance and visible high quality to GPT-5.1-Codex but at decrease total token price, as a result of extra environment friendly reasoning traces.

https://openai.com/index/gpt-5-1-codex-max/

Key Takeaways

  1. GPT 5.1 Codex Max is a frontier agentic coding mannequin constructed on an up to date reasoning base, additional skilled on actual software program engineering duties comparable to PR creation, code assessment, frontend coding and Q&A, and is accessible right now throughout Codex CLI, IDE, cloud and code assessment surfaces, with API entry coming later.
  2. The mannequin introduces native help for lengthy working work by compaction, the place it repeatedly compresses its personal historical past to span a number of context home windows, enabling autonomous coding periods that may proceed for greater than 24 hours over tens of millions of tokens whereas staying on a single activity.
  3. GPT 5.1 Codex Max retains the reasoning effort management from GPT 5.1, and at medium effort it outperforms GPT 5.1 Codex on SWE bench Verified whereas utilizing about 30 p.c fewer considering tokens, with an Extra High reasoning mode for the toughest duties.
  4. On frontier coding benchmarks with compaction enabled, GPT 5.1 Codex Max at xhigh effort improves SWE bench Verified from 73.7 p.c to 77.9 p.c, SWE Lancer IC SWE from 66.3 p.c to 79.9 p.c, and Terminal Bench 2.0 from 52.8 p.c to 58.1 p.c, in comparison with GPT 5.1 Codex at excessive effort.

Editorial Comments

GPT-5.1-Codex-Max is a clear assertion that OpenAI is doubling down on long-running, agentic coding quite than brief, single shot edits. Compaction, frontier coding evaluations like SWE-bench Verified and SWE-Lancer IC SWE, and express reasoning effort controls make this mannequin a check case for scaling test-time compute in actual software program engineering workflows, not simply benchmarks. The Preparedness Framework and Codex sandbox might be vital as this functionality strikes into manufacturing pipelines. Overall, GPT-5.1-Codex-Max is a frontier agentic coding mannequin that operationalises long-horizon reasoning in sensible developer instruments.

The publish OpenAI Debuts GPT-5.1-Codex-Max, a Long-Horizon Agentic Coding Model With Compaction for Multi-Window Workflows appeared first on MarkTechPost.

Similar Posts