Visual understanding: Unlocking the next frontier in AI

At the NYC AIAI Summit, Joseph Nelson, CEO & Co-Founder of Roboflow, took the stage to spotlight a critical but often overlooked frontier in AI: vision.

In a field dominated by breakthroughs in language models, Nelson argued that visual understanding – or how machines interpret the physical world – is just as essential for building intelligent systems that can operate in real-world conditions.

From powering instant replay at Wimbledon to enabling edge-based quality control in electric vehicle factories, his talk offered a grounded look at how visual AI is already transforming industries – and what it will take to make it truly robust, accessible, and ready for anything.

Roboflow now supports a million developers. Nelson walked through what some of them are building: real-world, production-level applications of visual AI across industries, open-source projects, and more. These examples show that visual understanding’s already being deployed at scale.

Three key themes in visual AI today

Nelson outlined three major points in his talk:

The long tails of computer vision. In visual AI, long-tail edge cases are a critical constraint. These rare or unpredictable situations limit the ability of models, including large vision-language models, to fully understand the real world.
What the future of visual models looks like. A central question is whether one model will eventually rule them all, or whether the future lies in a collection of smaller, purpose-built models. The answer will shape how machine learning is applied to visual tasks going forward.
Running real-time visual AI at the edge. Nelson emphasized the importance of systems that run on your own data, in real-time, at the edge. This isn’t just a technical detail; it’s foundational to how visual AI will be used in the real world.

Where AI meets the real world: The role of computer vision

Joseph Nelson framed computer vision as the point where artificial intelligence intersects directly with the physical world. “At Roboflow, we think that computer vision is where AI meets the real world,” he explained.

He emphasized that vision is a primary human sense, predating even language, and pointed out that some civilizations thrived without a written system, relying instead on visual understanding. That same principle applies when building software systems: giving them vision is like giving them read access to the physical world.

This visual “read access” enables software to answer practical questions:

How many people are in a conference room?
Were a set of products manufactured correctly?
Did objects make their way from point A to point B?

Nelson illustrated this with a range of real-world examples:

Counting candies produced in a shop
Validating traffic flows and lane usage
Evaluating a basketball player’s shot
Measuring field control in a soccer match

Despite the diversity of these applications, the common thread is clear: each uses visual understanding to generate actionable insights.

How Roboflow powers visual AI at scale

Roboflow’s mission is to provide the tools, platform, and solutions that enable enterprises to build and deploy visual AI. According to Nelson, users typically approach Roboflow in one of two ways:

By creating open-source projects, much like publishing code on GitHub
By building private projects for internal or commercial use

On the open-source front, Roboflow has become the largest community of computer vision developers on the web. The scale is significant:

Over 500 million user-labeled and shared images
More than 200,000 pre-trained models available

This ecosystem provides Roboflow with a unique insight into how computer vision is being applied, where it encounters challenges, and where it delivers the most value.

Roboflow also serves a wide range of enterprise customers. Nelson shared that more than half of the Fortune 100 have built with Roboflow, especially in domains grounded in the physical world, such as:

Retail operations
Electric vehicle manufacturing
Product logistics and shipping

Backed by a strong network of investors, Roboflow continues to grow its platform and support the expanding needs of developers and businesses working at the intersection of AI and the real world.

Bridging the gap to visual AGI

In wrapping up, Nelson tied together the major themes of his talk:

Better datasets (like RF100VL)
Better models (like RF-DETR)
Deployment flexibility, including on constrained and local hardware

Together, these advancements move us beyond the metaphor of “a brain in a jar.” Instead, Nelson described the vision for a true visual cortex, a key step toward real-world AI systems that can see, reason, and act.

“When we build with Roboflow… you’re a part of making sure that AI meets the real world and delivers on the promise of what we know is possible.”

Final thoughts

Joseph Nelson closed his talk at the NYC AIAI Summit with a clear message: for AI to meet the real world, it must see and understand it. That means building better datasets, such as RF100VL, creating models that generalize across messy, real-world domains, and ensuring those models can run in real-time, often at the edge.

From live sports broadcasts to pharmaceutical safety checks, and from open-source cat toys to advanced vehicle assembly lines, the breadth of visual AI’s impact is already vast. But the work is far from over. As Nelson put it, we’re still crossing the bridge, from large models as “brains in a jar” to intelligent systems with a working visual cortex.

By contributing to open-source tools, adapting models for deployment in the wild, and holding systems accountable through realistic evaluations, developers and researchers alike play a crucial role in advancing visual understanding. Roboflow’s mission is to support that effort, so that AI not only thinks, but sees.

Visual understanding: Unlocking the next frontier in AI

Three key themes in visual AI today

Where AI meets the real world: The role of computer vision

How Roboflow powers visual AI at scale

Bridging the gap to visual AGI

Final thoughts

This AI Paper Introduces PEVA: A Whole-Body Conditioned Diffusion Model for Predicting Egocentric Video from Human Motion

Coding Implementation to End-to-End Transformer Model Optimization with Hugging Face Optimum, ONNX Runtime, and Quantization

Mistral AI gives Le Chat voice recognition and deep research tools

NO FAKES Act: AI deepfakes protection or internet freedom threat?

Nvidia reclaims title of most valuable company on AI momentum

A Coding Guide to Build Flexible Multi-Model Workflows in GluonTS with Synthetic Data, Evaluation, and Advanced Visualizations

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

Three key themes in visual AI today

Where AI meets the real world: The role of computer vision

How Roboflow powers visual AI at scale

Bridging the gap to visual AGI

Final thoughts

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!