OpenAGI Foundation Launches Lux: A Foundation Computer Use Model that Tops Online Mind2Web with OSGym At Scale

How do you flip gradual, guide click on work throughout browsers and desktops right into a dependable, automated system that can really use a pc for you at scale? Lux is the newest instance of pc use brokers transferring from analysis demo to infrastructure. OpenAGI Foundation workforce has launched Lux, a basis mannequin that operates actual desktops and browsers and stories a rating of 83.6 on the Online Mind2Web benchmark, which covers greater than 300 actual world pc use duties. This is forward of Google Gemini CUA at 69.0, OpenAI Operator at 61.3 and Anthropic Claude Sonnet 4 at 61.0.

What Lux Actually Does?

Lux is a pc use mannequin, not a chat mannequin with a browser plugin. It takes a pure language purpose, views the display screen, and outputs low degree actions similar to clicks, key presses and scroll occasions. It can drive browsers, editors, spreadsheets, electronic mail shoppers and different desktop purposes as a result of it really works on rendered UI, not on utility particular APIs.

From a developer perspective, Lux is offered via the OpenAGI SDK and API console. The analysis workforce describes goal workloads that embody software program QA flows, deep analysis runs, social media administration, on-line retailer operations and bulk information entry. In all of those settings the agent must sequence dozens or lots of of UI actions whereas staying aligned with a pure language job description.

Three Execution Modes For Different Control Levels

Lux ships with three execution modes that expose completely different tradeoffs between pace, autonomy and management.

Actor mode is the quick path. It runs round 1 second per step and is aimed toward clearly specified duties similar to filling a type, pulling a report from a dashboard or extracting a small set of fields from a web page. Think of it as a low latency macro engine that nonetheless understands pure language.

Thinker mode handles imprecise or multi step targets. It decomposes the excessive degree instruction into smaller sub duties after which executes them. Example workloads embody multi web page analysis, triage of lengthy electronic mail queues or navigation of analytics interfaces the place the precise click on path will not be specified upfront.

Tasker mode offers most determinism. The caller provides an specific Python record of steps that Lux executes one after the other and it retries till the sequence completes or hits a tough failure. This permits groups to maintain job graphs, guardrails and failure insurance policies in their very own code whereas delegating UI management to the mannequin.

Tasker, Actor and Thinker are the three main modes for procedural workflows, quick execution and sophisticated purpose fixing.

Benchmarks, Latency And Cost

On Online Mind2Web, Lux reaches a hit charge of 83.6 %. The similar benchmark stories 69.0 % for Gemini CUA, 61.3 % for OpenAI Operator and 61.0 % for Claude Sonnet 4. The benchmark accommodates greater than 300 internet primarily based duties collected from actual providers, so it’s a helpful proxy for sensible brokers that drive browsers and internet apps.

Latency and value are the place the numbers develop into vital for engineering groups. OpenAGI workforce stories that Lux completes every step in about 1 second, whereas OpenAI Operator is round 3 seconds per step in the identical analysis setting. The analysis workforce additionally states that Lux is about 10 instances cheaper per token than Operator. For any agent that can simply run lots of of steps in a session, these fixed elements decide whether or not a workload is viable in manufacturing.

Agentic Active Pre-training and Why OSGym Matters?

Lux is skilled with a technique that OpenAGI analysis workforce calls Agentic Active Pre-training. The workforce contrasts this with commonplace language mannequin pre-training that passively ingests textual content from the web. The thought is that Lux learns by appearing in digital environments and refining its habits via massive scale interplay, somewhat than solely minimizing token prediction loss on static logs. The optimization goal differs from classical reinforcement studying, and is about as much as favor self pushed exploration and understanding as a substitute of a manually formed reward.

This coaching setup is dependent upon an information engine that can expose many working system environments in parallel. OpenAGI workforce has already open sourced that engine as OSGym, below an MIT license that permits each analysis and business use. OSGym runs full working system replicas, not solely browser sandboxes, and helps duties that span workplace software program, browsers, improvement instruments and multi utility workflows.

Key Takeaways

Lux is a basis pc use mannequin that operates full desktops and browsers and reaches 83.6 % success on the Online Mind2Web benchmark, forward of Gemini CUA, OpenAI Operator and Claude Sonnet-4.
Lux exposes 3 modes, Actor, Thinker and Tasker, which cowl low latency UI macros, multi step purpose decomposition and deterministic scripted execution for manufacturing workflows.
Lux is reported to run round 1 second per step and to be about 10 instances cheaper per token than OpenAI Operator, which issues for lengthy horizon brokers that run lots of of actions per job.
Lux is skilled with Agentic Active Pre-training, the place the mannequin learns by appearing in environments, somewhat than solely consuming static internet textual content, which targets strong display screen to motion habits as a substitute of pure language modeling.
OSGym, the open supply information engine behind Lux, can run greater than 1,000 OS replicas and generate greater than 1,400 multi flip trajectories per minute at low per duplicate value, which provides groups a sensible technique to prepare and consider their very own pc use brokers.

Check out the Official Announcement, Project and Repo. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to comply with us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.