|

Zhipu AI Releases GLM-4.6: Achieving Enhancements in Real-World Coding, Long-Context Processing, Reasoning, Searching and Agentic AI

Zhipu AI has launched GLM-4.6, a significant replace to its GLM collection targeted on agentic workflows, long-context reasoning, and sensible coding duties. The mannequin raises the enter window to 200K tokens with a 128K max output, targets decrease token consumption in utilized duties, and ships with open weights for native deployment.

https://z.ai/weblog/glm-4.6

So, what’s precisely is new?

  • Context + output limits: 200K enter context and 128K most output tokens.
  • Real-world coding outcomes: On the prolonged CC-Bench (multi-turn duties run by human evaluators in remoted Docker environments), GLM-4.6 is reported close to parity with Claude Sonnet 4 (48.6% win charge) and makes use of ~15% fewer tokens vs. GLM-4.5 to complete duties. Task prompts and agent trajectories are revealed for inspection.
  • Benchmark positioning: Zhipu summarizes “clear positive aspects” over GLM-4.5 throughout eight public benchmarks and states parity with Claude Sonnet 4/4.6 on a number of; it additionally notes GLM-4.6 nonetheless lags Sonnet 4.5 on coding—a helpful caveat for mannequin choice.
  • Ecosystem availability: GLM-4.6 is obtainable by way of Z.ai API and OpenRouter; it integrates with well-liked coding brokers (Claude Code, Cline, Roo Code, Kilo Code), and current Coding Plan customers can improve by switching the mannequin title to glm-4.6.
  • Open weights + license: Hugging Face mannequin card lists License: MIT and Model measurement: 355B params (MoE) with BF16/F32 tensors. (MoE “complete parameters” usually are not equal to lively parameters per token; no active-params determine is said for 4.6 on the cardboard.)
  • Local inference: vLLM and SGLang are supported for native serving; weights are on Hugging Face and ModelScope.
https://z.ai/weblog/glm-4.6

Summary

GLM-4.6 is an incremental however materials step: a 200K context window, ~15% token discount on CC-Bench versus GLM-4.5, near-parity process win-rate with Claude Sonnet 4, and instant availability by way of Z.ai, OpenRouter, and open-weight artifacts for native serving.


FAQs

1) What are the context and output token limits?
GLM-4.6 helps a 200K enter context and 128K most output tokens.

2) Are open weights accessible and underneath what license?
Yes. The Hugging Face mannequin card lists open weights with License: MIT and a 357B-parameter MoE configuration (BF16/F32 tensors).

3) How does GLM-4.6 evaluate to GLM-4.5 and Claude Sonnet 4 on utilized duties?
On the prolonged CC-Bench, GLM-4.6 stories ~15% fewer tokens vs. GLM-4.5 and near-parity with Claude Sonnet 4 (48.6% win-rate).

4) Can I run GLM-4.6 domestically?
Yes. Zhipu gives weights on Hugging Face/ModelScope and paperwork native inference with vLLM and SGLang; neighborhood quantizations are showing for workstation-class {hardware}.


Check out the GitHub Page, Hugging Face Model Card and Technical details. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter.

The put up Zhipu AI Releases GLM-4.6: Achieving Enhancements in Real-World Coding, Long-Context Processing, Reasoning, Searching and Agentic AI appeared first on MarkTechPost.

Similar Posts