If you thought the rampant spread of text-based fake news was as bad as it could get, think again. Generating fake news videos that are undistinguishable from real ones is growing easier by the day.
A team of computer scientists at the University of Washington have used artificial intelligence to render visually convincing videos of Barack Obama saying things he’s said before, but in a totally new context.
In a paper published this month, the researchers explained their methodology: Using a neural network trained on 17 hours of footage of the former US president’s weekly addresses, they were able to generate mouth shapes from arbitrary audio clips of Obama’s voice. The shapes were then textured to photorealistic quality and overlaid onto Obama’s face in a different “target” video. Finally, the researchers retimed the target video to move Obama’s body naturally to the rhythm of the new audio track.
This isn’t the first study to demonstrate the modification of a talking head in a video. As Quartz’s Dave Gershgorn previously reported, in June of last year, Stanford researchers published a similar methodology for altering a person’s pre-recorded facial expressions in real-time to mimic the expressions of another person making faces into a webcam. The new study, however, adds the ability to synthesize video directly from audio, effectively generating a higher dimension from a lower one.
In their paper, the researchers pointed to several practical applications of being able to generate high quality video from audio, including helping hearing-impaired people lip-read audio during a phone call or creating realistic digital characters in the film and gaming industries. But the more disturbing consequence of such a technology is its potential to proliferate video-based fake news. Though the researchers used only real audio for the study, they were able to skip and reorder Obama’s sentences seamlessly and even use audio from an Obama impersonator to achieve near-perfect results. The rapid advancement of voice-synthesis software also provides easy, off-the-shelf solutions for compelling, falsified audio.
There is some good news. Right now, the effectiveness of this video synthesis technique is limited by the amount and quality of footage available for a given person. Currently, the paper noted, the AI algorithms require at least several hours of footage and cannot handle certain edge cases, like facial profiles. The researchers chose Obama as their first case study because his weekly addresses provide an abundance of publicly available high-definition footage of him looking directly at the camera and adopting a consistent tone of voice. Synthesizing videos of other public figures that don’t fulfill those conditions would be more challenging and require further technological advancement. This buys time for technologies that detect fake video to develop in parallel. As The Economist reported earlier this month, one solution could be “to demand that recordings come with their metadata, which show when, where and how they were captured. Knowing such things makes it possible to eliminate a photograph as a fake on the basis, for example, of a mismatch with known local conditions at the time.”
But as the doors for new forms of fake media continue to fling open, it will ultimately be left to consumers to tread carefully.