Meituan Releases LongCat-2.0: A 1.6T-Parameter Open MoE Model with Native 1M Context and LongCat Sparse Attention

Meituan has launched LongCat-2.0, a large-scale Mixture-of-Experts (MoE) language mannequin. It carries 1.6 trillion whole parameters and prompts about 48 billion per token. The mannequin targets agentic coding: code understanding, technology, and execution inside agent workflows.

Two info stand out. First, LongCat-2.0 helps a local 1-million-token context window. Second, each coaching and serving ran totally on home AI ASIC superpods.

What is LongCat-2.0?

LongCat-2.0 is Meituan’s next-generation trillion-parameter open mannequin. It follows LongCat-Flash, a 560B mannequin launched in 2025. The structure was designed round one objective: dependable, environment friendly agentic coding.

Pretraining spanned greater than 35 trillion tokens over tens of millions of accelerator-hours. Meituan experiences no rollbacks or irrecoverable loss spikes throughout the run. That stability declare issues on non-Nvidia {hardware}, the place tooling is much less mature.

Architecture: How a 1.6T Model Stays Cheap to Run

The design combines 4 concepts that cut back the price of scale. Each one is value understanding by itself.

Zero-computation consultants: Not each token wants heavy compute. Simple tokens like punctuation path to a zero-computation professional and return unchanged. Complex tokens interact extra professional capability. A PID controller adjusts professional bias to carry the typical in vary. This produces the 33B–56B dynamic activation window as an alternative of a hard and fast price. The MoE spine makes use of a shortcut-connected design (ScMoE) for larger throughput.
LongCat Sparse Attention (LSA): Standard consideration scales quadratically with context size. LSA selects solely essentially the most related tokens, dropping the scaling nearer to linear. Meituan describes it as an evolution of DeepSeek Sparse Attention (DSA). It layers three orthogonal indexing strategies. Streaming-aware Indexing turns fragmented reminiscence reads into contiguous blocks. Cross-Layer Indexing reuses consideration saliency throughout adjoining layers. Hierarchical Indexing applies coarse-to-fine two-stage filtering. Together they maintain the 1M-token window and not using a reminiscence wall.
N-gram Embedding: The design provides a 135-billion-parameter N-gram embedding module. It sits orthogonal to the MoE consultants in sparse dimensions. Meituan says it captures dense native token relationships. It additionally reduces reminiscence I/O throughout large-batch decoding.
Post-training (MOPD): A devoted pipeline (MOPD) fuses three trainer professional teams. These cowl Agent, Reasoning, and Interaction capabilities into one unified mannequin.

For serving, Meituan makes use of a 6D parallelism scheme and a prefill-decode disaggregated structure. It additionally employs ‘tremendous kernels’ and L2-cache weight prefetching to cover I/O latency.

Benchmarks

Meituan positions LongCat-2.0 as an agentic coding mannequin. Every determine beneath comes from Meituan’s personal testing.

Benchmark	LongCat-2.0	What it measures
SWE-bench Pro	59.5	Real-world software program engineering duties
Terminal-Bench 2.1	70.8	Execution and error restoration in shells
SWE-bench Multilingual	77.3	Cross-language repository duties

On SWE-bench Pro, Meituan experiences LongCat-2.0 edging GPT-5.5 (58.6). Meituan additionally claims general efficiency akin to Google’s Gemini 3.1 Pro. The reported edge is concentrated in software program engineering. On broader general-agent benchmarks akin to FORTE and BrowseComp, protection signifies it trails main frontier methods. Independent leaderboard affirmation will not be but accessible.

LongCat-2.0 vs LongCat-Flash

The leap from the earlier technology is massive on paper. This desk makes use of every mannequin’s revealed specs.

Attribute	LongCat-2.0	LongCat-Flash
Total parameters	1.6T	560B
Active per token	~48B (33B–56B)	~27B (18.6B–31.3B)
Context window	1M tokens (native)	128K tokens
Long-context consideration	LongCat Sparse Attention	Multi-head Latent Attention
Reported {hardware}	Domestic AI ASIC superpods (coaching + serving)	H800 GPUs (inference reported)
Max output	128K tokens	Not specified
License	MIT	MIT
Released	June 30, 2026	September 2025
Weights	Coming quickly	Open

Use Cases With Examples

LongCat-2.0 is tuned for agent-style software program work, not informal chat. A few concrete patterns match its strengths.

Whole-repository reasoning: Feed a whole mid-sized codebase into the 1M-token window. Ask the mannequin to hint a bug throughout many information without delay. This avoids the summarization hacks that shorter home windows power.
Multi-step terminal duties: Run the mannequin inside an agent loop with shell entry. It can execute instructions, learn errors, and retry till a activity passes. The Terminal-Bench 2.1 focus targets precisely this workflow.
Repository-level edits: Ask for a refactor that spans a number of modules and assessments. The mannequin causes over the total context earlier than proposing coordinated adjustments.
Cross-language migration: Use the SWE-bench Multilingual energy for polyglot repositories. The mannequin can port logic between languages whereas preserving conduct.

These patterns run inside normal agent harnesses. Dev groups can subsequently undertake the mannequin with out constructing new tooling.

How to Access It

LongCat-2.0 is reachable via the LongCat API Platform. It exposes each OpenAI-compatible and Anthropic-compatible endpoints. The mannequin can be on OpenRouter and in harnesses like Claude Code, OpenClaw, OpenCode, and Codex. Local self-hosting will not be but attainable, since weights stay pending.

The OpenAI-compatible endpoint makes use of the mannequin ID LongCat-2.0. Maximum output size is 131072 tokens (128K). The snippet beneath calls the documented chat-completions endpoint.

Copy Code

# pip set up openai
from openai import OpenAI

shopper = OpenAI(
    api_key="YOUR_LONGCAT_API_KEY",
    base_url="https://api.longcat.chat/openai/v1",
)

resp = shopper.chat.completions.create(
    mannequin="LongCat-2.0",
    messages=[
        {"role": "system", "content": "You are a coding agent."},
        {"role": "user", "content": "Refactor utils.py to remove duplicate I/O logic."},
    ],
    max_tokens=4096,  # LongCat-2.0 helps as much as 131072 (128K)
)

print(resp.selections[0].message.content material)

Pricing is reported at $0.75 per million enter tokens and $2.95 per million output. A launch promotion lists $0.30 and $1.20, with cached context reads free. These figures come from third-party protection and could change.