Google DeepMind Introduces Vision Banana: An Instruction-Tuned Image Generator That Beats SAM 3 on Segmentation and Depth Anything V3 on Metric Depth Estimation
For years, the pc imaginative and prescient neighborhood has operated on two separate tracks: generative fashions (which produce photographs) and discriminative fashions (which perceive them). The assumption was easy — fashions good at making photos aren’t essentially good at studying them. A brand new paper from Google, titled “Image Generators are Generalist Vision Learners” (arXiv:2604.20329),…
