Alibaba’s Qwen AI Releases Compact Dense Qwen3-VL 4B/8B (Instruct & Thinking) With FP8 Checkpoints

Do you really want a large VLM when dense Qwen3-VL 4B/8B (Instruct/Thinking) with FP8 runs in low VRAM but retains 256K→1M context and the total functionality floor? Alibaba’s Qwen workforce has expanded its multimodal lineup with dense Qwen3-VL models at 4B and 8B scales, every delivery in two job profiles—Instruct and Thinking—plus FP8-quantized checkpoints for low-VRAM deployment. The drop arrives as a smaller, edge-friendly complement to the beforehand launched 30B (MoE) and 235B (MoE) tiers and retains the identical functionality floor: picture/video understanding, OCR, spatial grounding, and GUI/agent management.

https://github.com/QwenLM/Qwen3-VL/tree/essential

What’s within the launch?

SKUs and variants: The new additions comprise 4 dense fashions—Qwen3-VL-4B and Qwen3-VL-8B, every in Instruct and Thinking editions—alongside FP8 variations of the 4B/8B Instruct and Thinking checkpoints. The official announcement explicitly frames these as “compact, dense” fashions with decrease VRAM utilization and full Qwen3-VL capabilities retained.

Context size and functionality floor: The mannequin playing cards listing native 256K context with expandability to 1M, and doc the total function set: long-document and video comprehension, 32-language OCR, 2D/3D spatial grounding, visible coding, and agentic GUI management on desktop and cell. These attributes carry over to the brand new 4B/8B SKUs.

Architecture notes: Qwen3-VL highlights three core updates: Interleaved-MRoPE for sturdy positional encoding over time/width/peak (long-horizon video), DeepStack for fusing multi-level ViT options and sharpening picture–textual content alignment, and Text–Timestamp Alignment past T-RoPE for occasion localization in video. These design particulars seem within the new playing cards as effectively, signaling architectural continuity throughout sizes.

Project timeline: The Qwen3-VL GitHub “News” part information the publication of Qwen3-VL-4B (Instruct/Thinking) and Qwen3-VL-8B (Instruct/Thinking) on Oct 15, 2025, following earlier releases of the 30B MoE tier and organization-wide FP8 availability.

FP8: deployment-relevant particulars

Numerics and parity declare: The FP8 repositories state fine-grained FP8 quantization with block measurement 128, with efficiency metrics almost an identical to the unique BF16 checkpoints. For groups evaluating precision trade-offs on multimodal stacks (imaginative and prescient encoders, cross-modal fusion, long-context consideration), having vendor-produced FP8 weights reduces re-quantization and re-validation burden.

Tooling standing: The 4B-Instruct-FP8 card notes that Transformers doesn’t but load these FP8 weights immediately, and recommends vLLM or SGLang for serving; the cardboard consists of working launch snippets. Separately, the vLLM recipes information recommends FP8 checkpoints for H100 reminiscence effectivity. Together, these level to fast, supported paths for low-VRAM inference.

Key Takeaways

Qwen launched dense Qwen3-VL 4B and 8B fashions, every in Instruct and Thinking variants, with FP8 checkpoints.
FP8 makes use of fine-grained FP8 (block measurement 128) with near-BF16 metrics; Transformers loading shouldn’t be but supported—use vLLM/SGLang.
Capability floor is preserved: 256K→1M context, 32-language OCR, spatial grounding, video reasoning, and GUI/agent management.
Model Card-reported sizes: Qwen3-VL-4B ≈ 4.83B params; Qwen3-VL-8B-Instruct ≈ 8.77B params.

Editorial Comments

Qwen’s determination to ship dense Qwen3-VL 4B/8B in each Instruct and Thinking kinds with FP8 checkpoints is the sensible a part of the story: lower-VRAM, deployment-ready weights (fine-grained FP8, block measurement 128) and express serving steerage (vLLM/SGLang) makes it simply deployable. The functionality floor—256K context expandable to 1M, 32-language OCR, spatial grounding, video understanding, and agent management—stays intact at these smaller scales, which issues greater than leaderboard rhetoric for groups focusing on single-GPU or edge budgets.

Check out the Model on Hugging Face and GitHub Repo. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to observe us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The submit Alibaba’s Qwen AI Releases Compact Dense Qwen3-VL 4B/8B (Instruct & Thinking) With FP8 Checkpoints appeared first on MarkTechPost.

Alibaba’s Qwen AI Releases Compact Dense Qwen3-VL 4B/8B (Instruct & Thinking) With FP8 Checkpoints

What’s within the launch?

FP8: deployment-relevant particulars

Key Takeaways

Editorial Comments

What are Optical Character Recognition (OCR) Models? Top Open-Source OCR Models

Top 10 Local LLMs (2025): Context Windows, VRAM Targets, and Licenses Compared

The concerted effort of maintaining application resilience

Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family

UK and Singapore form alliance to guide AI in finance

NVIDIA and Mistral AI Bring 10x Faster Inference for the Mistral 3 Family on GB200 NVL72 GPU Systems

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

What’s within the launch?

FP8: deployment-relevant particulars

Key Takeaways

Editorial Comments

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!