Hugging Face Open-Sourced FineVision: A New Multimodal Dataset with 24 Million Samples for Training Vision-Language Models (VLMs)
Hugging Face has simply launched SuperbVision, an open multimodal dataset designed to set a brand new customary for Vision-Language Models (VLMs). With 17.3 million photos, 24.3 million samples, 88.9 million question-answer turns, and almost 10 billion reply tokens, SuperbVision place itself as one of many largest and structured publicly accessible VLM coaching datasets. SuperbVision aggregates…
