xAI launches Grok-4-Fast: Unified Reasoning and Non-Reasoning Model with 2M-Token Context and Trained End-to-End with Tool-Use Reinforcement Learning (RL)

ByRicardo September 20, 2025

xAI launched Grok-4-Fast, a cost-optimized successor to Grok-4 that merges “reasoning” and “non-reasoning” behaviors right into a single set of weights controllable by way of system prompts. The mannequin targets high-throughput search, coding, and Q&A with a 2M-token context window and native tool-use RL that decides when to browse the net, execute code, or name instruments.

Architecture notice

Previous Grok releases break up long-chain “reasoning” and brief “non-reasoning” responses throughout separate fashions. Grok-4-Fast’s unified weight house reduces end-to-end latency and tokens by steering conduct by way of system prompts, which is related for real-time functions (search, assistive brokers, and interactive coding) the place switching fashions penalizes each latency and price.

Search and agentic use

Grok-4-Fast was skilled end-to-end with tool-use reinforcement studying and exhibits good points on search-centric agent benchmarks: BrowseComp 44.9%, SimpleQA 95.0%, Reka Research 66.0%, plus larger scores on Chinese variants (e.g., BrowseComp-zh 51.2%). xAI additionally cites non-public battle-testing on LMArena the place grok-4-fast-search (codename “menlo”) ranks #1 within the Search Arena with 1163 Elo, and the textual content variant (codename “tahoe”) sits at #8 within the Text Arena, roughly on par with grok-4-0709.

Performance and effectivity deltas

On inner and public benchmarks, Grok-4-Fast posts frontier-class scores whereas reducing token utilization. xAI reviews move@1 outcomes of 92.0% (AIME 2025, no instruments), 93.3% (HMMT 2025, no instruments), 85.7% (GPQA Diamond), and 80.0% (LiveCodeBench Jan–May), approaching or matching Grok-4 however utilizing ~40% fewer “pondering” tokens on common. The firm frames this as “intelligence density,” claiming a ~98% discount in value to achieve the identical benchmark efficiency as Grok-4 when the decrease token rely and new per-token pricing are mixed.

Deployment and value

The mannequin is usually obtainable to all customers in Grok’s Fast and Auto modes throughout internet and cellular; Auto will choose Grok-4-Fast for tough queries to enhance latency with out shedding high quality, and—for the primary time—free customers entry xAI’s newest mannequin tier. For builders, xAI exposes two SKUs—grok-4-fast-reasoning and grok-4-fast-non-reasoning—each with 2M context. Pricing (xAI API) is $0.20 / 1M enter tokens (<128k), $0.40 / 1M enter tokens (≥128k), $0.50 / 1M output tokens (<128k), $1.00 / 1M output tokens (≥128k), and $0.05 / 1M cached enter tokens.

5 Technical Takeaways:

Unified mannequin + 2M context. Grok-4-Fast makes use of a single weight house for “reasoning” and “non-reasoning,” prompt-steered, with a 2,000,000-token window throughout each SKUs.
Pricing for scale. API pricing begins at $0.20/M enter, $0.50/M output, with cached enter at $0.05/M and larger charges solely past 128K context.
Efficiency claims. xAI reviews ~40% fewer “pondering” tokens at comparable accuracy vs Grok-4, yielding a ~98% cheaper price to match Grok-4 efficiency on frontier benchmarks.
Benchmark profile. Reported move@1: AIME-2025 92.0%, HMMT-2025 93.3%, GPQA-Diamond 85.7%, LiveCodeBench (Jan–May) 80.0%.
Agentic/search use. Post-training with tool-use RL; positioned for searching/search workflows with documented search-agent metrics and live-search billing in docs.

Summary

Grok-4-Fast packages Grok-4-level functionality right into a single, prompt-steerable mannequin with a 2M-token window, tool-use RL, and pricing tuned for high-throughput search and agent workloads. Early public alerts (LMArena #1 in Search, aggressive Text placement) align with xAI’s declare of comparable accuracy utilizing ~40% fewer “pondering” tokens, translating to decrease latency and unit price in manufacturing.

Check out the Technical details. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter.

The publish xAI launches Grok-4-Fast: Unified Reasoning and Non-Reasoning Model with 2M-Token Context and Trained End-to-End with Tool-Use Reinforcement Learning (RL) appeared first on MarkTechPost.

Agentic AI AI Agents

Anthropic AI Releases Petri: An Open-Source Framework for Automated Auditing by Using AI Agents to Test the Behaviors of Target Models on Diverse Scenarios
ByRicardo October 8, 2025

How do you audit frontier LLMs for misaligned conduct in real looking multi-turn, tool-use settings—at scale and past coarse mixture scores? Anthropic launched Petri (Parallel Exploration Tool for Risky Interactions), an open-source framework that automates alignment audits by orchestrating an auditor agent to probe a goal mannequin throughout multi-turn, tool-augmented interactions and a choose mannequin…

Read More Anthropic AI Releases Petri: An Open-Source Framework for Automated Auditing by Using AI Agents to Test the Behaviors of Target Models on Diverse Scenarios
Agentic AI Artificial Intelligence

Alibaba Introduces Qwen3-Max-Thinking, a Test Time Scaled Reasoning Model with Native Tool Use Powering Agentic Workloads
ByRicardo January 30, 2026

Qwen3-Max-Thinking is Alibaba’s new flagship reasoning model. It does not only scale parameters, it also changes how inference is done, with explicit control over thinking depth and built in tools for search, memory, and code execution. https://qwen.ai/blog?id=qwen3-max-thinking Model scale, data, and deployment Qwen3-Max-Thinking is a trillion-parameter MoE flagship LLM pretrained on 36T tokens and built…

Read More Alibaba Introduces Qwen3-Max-Thinking, a Test Time Scaled Reasoning Model with Native Tool Use Powering Agentic Workloads
Agentic AI AI Paper Summary

Forget Keyword Imitation: ByteDance AI Maps Molecular Bonds in AI Reasoning to Stabilize Long Chain-of-Thought Performance and Reinforcement Learning (RL) Training
ByRicardo February 22, 2026

ByteDance Seed recently dropped a research that might change how we build reasoning AI. For years, devs and AI researchers have struggled to ‘cold-start’ Large Language Models (LLMs) into Long Chain-of-Thought (Long CoT) models. Most models lose their way or fail to transfer patterns during multi-step reasoning. The ByteDance team discovered the problem: we have…

Read More Forget Keyword Imitation: ByteDance AI Maps Molecular Bonds in AI Reasoning to Stabilize Long Chain-of-Thought Performance and Reinforcement Learning (RL) Training
Agentic AI AI Agents

Deep Research Agents: A Systematic Roadmap for LLM-Based Autonomous Research Systems
ByRicardo July 20, 2025

A team of researchers from University of Liverpool, Huawei Noah’s Ark Lab, University of Oxford and University College London presents a report explaining Deep Research Agents (DR agents), a new paradigm in autonomous research. These systems are powered by Large Language Models (LLMs) and designed to handle complex, long-horizon tasks that require dynamic reasoning, adaptive…

Read More Deep Research Agents: A Systematic Roadmap for LLM-Based Autonomous Research Systems
Agentic AI Articles

Why I’m (hopefully) never building another agent
ByRicardo February 3, 2026

You know that moment when you realize you’ve been solving the same problem over and over? That’s where I found myself about a year ago. My name’s Noa Flaherty, and I’m the CTO and co-founder of Vellum. After three years of building tools for AI development, I had this wild thought: what if we just…

Read More Why I’m (hopefully) never building another agent
Agentic AI AI Paper Summary

Alibaba Qwen Team Releases Mobile-Agent-v3 and GUI-Owl: Next-Generation Multi-Agent Framework for GUI Automation
ByRicardo August 31, 2025August 31, 2025

Desk of contents Introduction: The Rise of GUI Agents Architecture and Core Capabilities Training and Data Pipeline Benchmarking and Performance Real-World Deployment Conclusion: Toward General-Purpose GUI Agents Picture supply: Marktechpost.com Introduction: The Rise of GUI Brokers Trendy computing is dominated by graphical person interfaces throughout gadgets—cell, desktop, and net. Automating duties in these environments has…

Read More Alibaba Qwen Team Releases Mobile-Agent-v3 and GUI-Owl: Next-Generation Multi-Agent Framework for GUI Automation

xAI launches Grok-4-Fast: Unified Reasoning and Non-Reasoning Model with 2M-Token Context and Trained End-to-End with Tool-Use Reinforcement Learning (RL)

Architecture notice

Search and agentic use

Performance and effectivity deltas

Deployment and value

5 Technical Takeaways:

Summary

Anthropic AI Releases Petri: An Open-Source Framework for Automated Auditing by Using AI Agents to Test the Behaviors of Target Models on Diverse Scenarios

Alibaba Introduces Qwen3-Max-Thinking, a Test Time Scaled Reasoning Model with Native Tool Use Powering Agentic Workloads

Forget Keyword Imitation: ByteDance AI Maps Molecular Bonds in AI Reasoning to Stabilize Long Chain-of-Thought Performance and Reinforcement Learning (RL) Training

Deep Research Agents: A Systematic Roadmap for LLM-Based Autonomous Research Systems

Why I’m (hopefully) never building another agent

Alibaba Qwen Team Releases Mobile-Agent-v3 and GUI-Owl: Next-Generation Multi-Agent Framework for GUI Automation

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Architecture notice

Search and agentic use

Performance and effectivity deltas

Deployment and value

5 Technical Takeaways:

Summary

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!