DeepSeek AI Releases DeepSeek-V4: Compressed Sparse Attention and Heavily Compressed Attention Enable One-Million-Token Contexts
DeepSeek-AI has launched a preview model of the DeepSeek-V4 sequence: two Mixture-of-Experts (MoE) language fashions constructed round one core problem making one-million-token context home windows sensible and reasonably priced at inference time. The sequence consists of DeepSeek-V4-Pro, with 1.6T whole parameters and 49B activated per token, and DeepSeek-V4-Flash, with 284B whole parameters and 13B activated…
