In the video above, a scientist hits stuff with a drumstick.
The sounds that come out of that are pretty normal: the swish of the drumstick through some leaves, the drag of it sliding across a rock, the light crack as it knocks the leg of a chair. But many of those sounds are actually fake. They were added to the video, which had no sound, by a deep-learning algorithm created by MIT researchers.
Over several months, the researchers made over a thousand recordings of different things getting hit with a drumstick. When they were done, they had a library of 46,000 sounds, each one bundled with a few frames of video. Then they fed all that data to a deep-learning computer, which analyzed the relationship between every piece of audio and the physical properties it could perceive in the videos.
The resulting algorithm is capable of predicting what sound any surface or object will make when it gets hit with a drumstick. Sometimes it pulls sounds from its vast library, and sometimes it synthesizes new sounds based on the properties it sees in the video.
The researchers think this kind of technology could be used to automatically produce sound effects for movies and TV shows. But more importantly, it could also lead to robots that are instinctively better at understanding their surroundings. The set of physical properties, patterns, and relationships that are key to predicting sound also happen to tell you a lot of other things about the world; a robot which “instinctively” understands that cement is hard, and that grass is soft, would have a huge leg up in navigating the world around it.
Sound, it turns out, can tell you about a whole lot more than just sound.