AI learns how vision and sound are connected, without human intervention

May 22, 2025

The complete post is available where it was originally published on this site

Humans naturally learn by making connections between sight and sound. For instance, we can watch someone playing the cello and recognize that the cellist’s movements are generating the music we hear.

A new approach developed by researchers from MIT and elsewhere improves an AI model’s ability to learn in this same fashion. This could be useful in applications such as journalism and film production, where the model could help with curating multimodal content through automatic video and audio retrieval

AI learns how vision and sound are connected, without human intervention

About us

Company