xAI launches Grok-4-Fast: Unified Reasoning and Non-Reasoning Model with 2M-Token Context and Trained End-to-End with Tool-Use Reinforcement Learning (RL)

xAI launched Grok-4-Fast, a cost-optimized successor to Grok-4 that merges “reasoning” and “non-reasoning” behaviors right into a single set of weights controllable by way of system prompts. The mannequin targets high-throughput search, coding, and Q&A with a 2M-token context window and native tool-use RL that decides when to browse the net, execute code, or name instruments.
Architecture notice
Previous Grok releases break up long-chain “reasoning” and brief “non-reasoning” responses throughout separate fashions. Grok-4-Fast’s unified weight house reduces end-to-end latency and tokens by steering conduct by way of system prompts, which is related for real-time functions (search, assistive brokers, and interactive coding) the place switching fashions penalizes each latency and price.
Search and agentic use
Grok-4-Fast was skilled end-to-end with tool-use reinforcement studying and exhibits good points on search-centric agent benchmarks: BrowseComp 44.9%, SimpleQA 95.0%, Reka Research 66.0%, plus larger scores on Chinese variants (e.g., BrowseComp-zh 51.2%). xAI additionally cites non-public battle-testing on LMArena the place grok-4-fast-search
(codename “menlo”) ranks #1 within the Search Arena with 1163 Elo, and the textual content variant (codename “tahoe”) sits at #8 within the Text Arena, roughly on par with grok-4-0709
.
Performance and effectivity deltas
On inner and public benchmarks, Grok-4-Fast posts frontier-class scores whereas reducing token utilization. xAI reviews move@1 outcomes of 92.0% (AIME 2025, no instruments), 93.3% (HMMT 2025, no instruments), 85.7% (GPQA Diamond), and 80.0% (LiveCodeBench Jan–May), approaching or matching Grok-4 however utilizing ~40% fewer “pondering” tokens on common. The firm frames this as “intelligence density,” claiming a ~98% discount in value to achieve the identical benchmark efficiency as Grok-4 when the decrease token rely and new per-token pricing are mixed.
Deployment and value
The mannequin is usually obtainable to all customers in Grok’s Fast and Auto modes throughout internet and cellular; Auto will choose Grok-4-Fast for tough queries to enhance latency with out shedding high quality, and—for the primary time—free customers entry xAI’s newest mannequin tier. For builders, xAI exposes two SKUs—grok-4-fast-reasoning
and grok-4-fast-non-reasoning
—each with 2M context. Pricing (xAI API) is $0.20 / 1M enter tokens (<128k), $0.40 / 1M enter tokens (≥128k), $0.50 / 1M output tokens (<128k), $1.00 / 1M output tokens (≥128k), and $0.05 / 1M cached enter tokens.

5 Technical Takeaways:
- Unified mannequin + 2M context. Grok-4-Fast makes use of a single weight house for “reasoning” and “non-reasoning,” prompt-steered, with a 2,000,000-token window throughout each SKUs.
- Pricing for scale. API pricing begins at $0.20/M enter, $0.50/M output, with cached enter at $0.05/M and larger charges solely past 128K context.
- Efficiency claims. xAI reviews ~40% fewer “pondering” tokens at comparable accuracy vs Grok-4, yielding a ~98% cheaper price to match Grok-4 efficiency on frontier benchmarks.
- Benchmark profile. Reported move@1: AIME-2025 92.0%, HMMT-2025 93.3%, GPQA-Diamond 85.7%, LiveCodeBench (Jan–May) 80.0%.
- Agentic/search use. Post-training with tool-use RL; positioned for searching/search workflows with documented search-agent metrics and live-search billing in docs.
Summary
Grok-4-Fast packages Grok-4-level functionality right into a single, prompt-steerable mannequin with a 2M-token window, tool-use RL, and pricing tuned for high-throughput search and agent workloads. Early public alerts (LMArena #1 in Search, aggressive Text placement) align with xAI’s declare of comparable accuracy utilizing ~40% fewer “pondering” tokens, translating to decrease latency and unit price in manufacturing.
Check out the Technical details. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter.
The publish xAI launches Grok-4-Fast: Unified Reasoning and Non-Reasoning Model with 2M-Token Context and Trained End-to-End with Tool-Use Reinforcement Learning (RL) appeared first on MarkTechPost.