Meet ARGUS: A Scalable AI Framework for Training Large Recommender Transformers to One Billion Parameters

ByRicardo September 6, 2025

Yandex has launched ARGUS (AutoRegressive Generative User Sequential modeling), a large-scale transformer-based framework for recommender methods that scales up to one billion parameters. This breakthrough locations Yandex amongst a small group of worldwide expertise leaders — alongside Google, Netflix, and Meta — which have efficiently overcome the long-standing technical limitations in scaling recommender transformers.

Breaking Technical Barriers in Recommender Systems

Recommender methods have lengthy struggled with three cussed constraints: short-term reminiscence, restricted scalability, and poor adaptability to shifting person conduct. Conventional architectures trim person histories down to a small window of current interactions, discarding months or years of behavioral information. The result’s a shallow view of intent that misses long-term habits, delicate shifts in style, and seasonal cycles. As catalogs broaden into the billions of things, these truncated fashions not solely lose precision but in addition choke on the computational calls for of personalization at scale. The consequence is acquainted: stale suggestions, decrease engagement, and fewer alternatives for serendipitous discovery.

Very few corporations have efficiently scaled recommender transformers past experimental setups. Google, Netflix, and Meta have invested closely on this space, reporting features from architectures like YouTubeDNN, PinnerFormer, and Meta’s Generative Recommenders. With ARGUS, Yandex joins this choose group of corporations demonstrating billion-parameter recommender fashions in reside companies. By modeling complete behavioral timelines, the system uncovers each apparent and hidden correlations in person exercise. This long-horizon perspective permits ARGUS to seize evolving intent and cyclical patterns with far larger constancy. For instance, as an alternative of reacting solely to a current buy, the mannequin learns to anticipate seasonal behaviors—like routinely surfacing the popular model of tennis balls when summer time approaches—with out requiring the person to repeat the identical indicators yr after yr.

Technical Innovations Behind ARGUS

The framework introduces a number of key advances:

Dual-objective pre-training: ARGUS decomposes autoregressive studying into two subtasks — next-item prediction and suggestions prediction. This mixture improves each imitation of historic system conduct and modeling of true person preferences.
Scalable transformer encoders: Models scale from 3.2M to 1B parameters, with constant efficiency enhancements throughout all metrics. At the billion-parameter scale, pairwise accuracy uplift elevated by 2.66%, demonstrating the emergence of a scaling regulation for recommender transformers.
Extended context modeling: ARGUS handles person histories up to 8,192 interactions lengthy in a single go, enabling personalization over months of conduct fairly than simply the previous few clicks.
Efficient fine-tuning: A two-tower structure permits offline computation of embeddings and scalable deployment, decreasing inference price relative to prior target-aware or impression-level on-line fashions.

Real-World Deployment and Measured Gains

ARGUS has already been deployed at scale on Yandex’s music platform, serving tens of millions of customers. In manufacturing A/B assessments, the system achieved:

+2.26% improve in complete listening time (TLT)
+6.37% improve in like chance

These represent the biggest recorded high quality enhancements within the platform’s historical past for any deep learning–based mostly recommender mannequin.

Future Directions

Yandex researchers plan to lengthen ARGUS to real-time advice duties, discover characteristic engineering for pairwise rating, and adapt the framework to high-cardinality domains akin to giant e-commerce and video platforms. The demonstrated capacity to scale user-sequence modeling with transformer architectures means that recommender methods are poised to comply with a scaling trajectory related to pure language processing.

Conclusion

With ARGUS, Yandex has established itself as one of many few world leaders driving state-of-the-art recommender methods. By brazenly sharing its breakthroughs, the corporate isn’t solely bettering personalization throughout its personal companies but in addition accelerating the evolution of advice applied sciences for your complete trade.

Check out the PAPER here. Thanks to the Yandex workforce for the thought management/ Resources for this text.

The publish Meet ARGUS: A Scalable AI Framework for Training Large Recommender Transformers to One Billion Parameters appeared first on MarkTechPost.

Artificial Intelligence Editors Pick

Implementing DeepSpeed for Scalable Transformers: Advanced Training with Gradient Checkpointing and Parallelism
ByRicardo September 7, 2025

In this superior DeepSpeed tutorial, we offer a hands-on walkthrough of cutting-edge optimization methods for coaching massive language fashions effectively. By combining ZeRO optimization, mixed-precision coaching, gradient accumulation, and superior DeepSpeed configurations, the tutorial demonstrates how one can maximize GPU reminiscence utilization, scale back coaching overhead, and allow scaling of transformer fashions in resource-constrained environments,…

Read More Implementing DeepSpeed for Scalable Transformers: Advanced Training with Gradient Checkpointing and Parallelism
Agentic AI Editors Pick

Meet OAT: The New Action Tokenizer Bringing LLM-Style Scaling and Flexible, Anytime Inference to the Robotics World
ByRicardo February 12, 2026

Robots are entering their GPT-3 era. For years, researchers have tried to train robots using the same autoregressive (AR) models that power large language models (LLMs). If a model can predict the next word in a sentence, it should be able to predict the next move for a robotic arm. However, a technical wall has…

Read More Meet OAT: The New Action Tokenizer Bringing LLM-Style Scaling and Flexible, Anytime Inference to the Robotics World
Artificial Intelligence Editors Pick

Mistral AI Releases OCR 3: A Smaller Optical Character Recognition (OCR) Model for Structured Document AI at Scale
ByRicardo December 22, 2025

Mistral AI has released Mistral OCR 3, its latest optical character recognition service that powers the company’s Document AI stack. The model, named as mistral-ocr-2512, is built to extract interleaved text and images from PDFs and other documents while preserving structure, and it does this at an aggressive price of $2 per 1,000 pages with…

Read More Mistral AI Releases OCR 3: A Smaller Optical Character Recognition (OCR) Model for Structured Document AI at Scale
Editors Pick New Releases

Polaris-4B and Polaris-7B: Post-Training Reinforcement Learning for Efficient Math and Logic Reasoning
ByRicardo June 27, 2025

The Rising Need for Scalable Reasoning Models in Machine Intelligence Advanced reasoning models are at the frontier of machine intelligence, especially in domains like math problem-solving and symbolic reasoning. These models are designed to perform multi-step calculations and logical deductions, often generating solutions that mirror human reasoning processes. Reinforcement learning techniques are used to improve…

Read More Polaris-4B and Polaris-7B: Post-Training Reinforcement Learning for Efficient Math and Logic Reasoning
Artificial Intelligence Editors Pick

Google AI Releases EmbeddingGemma: A 308M Parameter On-Device Embedding Model with State-of-the-Art MTEB Results
ByRicardo September 4, 2025

EmbeddingGemma is Google’s new open textual content embedding mannequin optimized for on-device AI, designed to stability effectivity with state-of-the-art retrieval efficiency. How compact is EmbeddingGemma in comparison with different fashions? At simply 308 million parameters, EmbeddingGemma is light-weight sufficient to run on cell units and offline environments. Despite its measurement, it performs competitively with a…

Read More Google AI Releases EmbeddingGemma: A 308M Parameter On-Device Embedding Model with State-of-the-Art MTEB Results
Agentic AI Editors Pick

Google AI Introduces the WebMCP to Enable Direct and Structured Website Interactions for New AI Agents
ByRicardo February 17, 2026

Google is officially turning Chrome into a playground for AI agents. For years, AI ‘browsers’ have relied on a messy process: taking screenshots of websites, running them through vision models, and guessing where to click. This method is slow, breaks easily, and consumes massive amounts of compute. Google has introduced a better way: the Web…

Read More Google AI Introduces the WebMCP to Enable Direct and Structured Website Interactions for New AI Agents

Meet ARGUS: A Scalable AI Framework for Training Large Recommender Transformers to One Billion Parameters