|

Tencent Hunyuan Video-Foley brings lifelike audio to AI video

🚀

A crew at Tencent’s Hunyuan lab has created a brand new AI, ‘Hunyuan Video-Foley,’ that lastly brings lifelike audio to generated video. It’s designed to take heed to movies and generate a high-quality soundtrack that’s completely in sync with the motion on display.

Ever watched an AI-generated video and felt like one thing was lacking? The visuals is perhaps gorgeous, however they usually have an eerie silence that breaks the spell. Within the movie trade, the sound that fills that silence – the rustle of leaves, the clap of thunder, the clink of a glass – is known as Foley artwork, and it’s a painstaking craft carried out by consultants.

Matching that stage of element is a big problem for AI. For years, automated methods have struggled to create plausible sounds for movies.

How is Tencent fixing the AI-generated audio for video drawback?

One of many largest causes video-to-audio (V2A) fashions usually fell brief within the sound division was what the researchers name “modality imbalance”. Primarily, the AI was listening extra to the textual content prompts it was given than it was watching the precise video.

As an example, when you gave a mannequin a video of a busy seashore with folks strolling and seagulls flying, however the textual content immediate solely stated “the sound of ocean waves,” you’d probably simply get the sound of waves. The AI would utterly ignore the footsteps within the sand and the calls of the birds, making the scene really feel lifeless.

On prime of that, the standard of the audio was usually subpar, and there merely wasn’t sufficient high-quality video with sound to coach the fashions successfully.

Tencent’s Hunyuan crew tackled these issues from three totally different angles:

  1. Tencent realised the AI wanted a greater training, so that they constructed an enormous, 100,000-hour library of video, audio, and textual content descriptions for it to be taught from. They created an automatic pipeline that filtered out low-quality content material from the web, eliminating clips with lengthy silences or compressed, fuzzy audio, making certain the AI realized from the absolute best materials.
  1. They designed a wiser structure for the AI. Consider it like instructing the mannequin to correctly multitask. The system first pays extremely shut consideration to the visual-audio hyperlink to get the timing good—like matching the thump of a footstep to the precise second a shoe hits the pavement. As soon as it has that timing locked down, it then incorporates the textual content immediate to grasp the general temper and context of the scene. This twin method ensures the particular particulars of the video are by no means missed.
  1. To ensure the sound was high-quality, they used a coaching technique referred to as Illustration Alignment (REPA). That is like having an knowledgeable audio engineer continuously trying over the AI’s shoulder throughout its coaching. It compares the AI’s work to options from a pre-trained, professional-grade audio mannequin to information it in the direction of producing cleaner, richer, and extra steady sound.

The outcomes communicate sound for themselves

When Tencent examined Hunyuan Video-Foley in opposition to different main AI fashions, the audio outcomes have been clear. It wasn’t simply that the computer-based metrics have been higher; human listeners persistently rated its output as increased high quality, higher matched to the video, and extra precisely timed.

Throughout the board, the AI delivered enhancements in making the sound match the on-screen motion, each when it comes to content material and timing. The outcomes throughout a number of analysis datasets help this:

Evaluation results of Tencent Hunyuan Video-Foley against other leading AI models.

Tencent’s work helps to shut the hole between silent AI movies and an immersive viewing expertise with high quality audio. It’s bringing the magic of Foley artwork to the world of automated content material creation, which may very well be a robust functionality for filmmakers, animators, and creators all over the place.

See additionally: Google Vids gets AI avatars and image-to-video tools

Banner for the AI & Big Data Expo event series.

Need to be taught extra about AI and large knowledge from trade leaders? Take a look at AI & Big Data Expo happening in Amsterdam, California, and London. The excellent occasion is a part of TechEx and is co-located with different main know-how occasions, click on here for extra info.

AI Information is powered by TechForge Media. Discover different upcoming enterprise know-how occasions and webinars here.

The publish Tencent Hunyuan Video-Foley brings lifelike audio to AI video appeared first on AI News.

Similar Posts