Nothing pixelated will stay safe on the internet

It’s becoming much easier to crack internet privacy measures, especially blurred or pixelated images. Those methods make it tough for people to see sensitive information such as obscured license plate numbers or censored faces, but researchers from University of Texas at Austin and Cornell University say that the practice is wildly insecure in the age of machine learning.

By Dave Gershgorn3 min readUpdated July 21, 2022

Add QZ to Google

Using simple deep learning tools, the three-person team was able identify obfuscated faces and numbers with alarming accuracy. On an industry standard dataset where humans had 0.19% chance of identifying a face, the algorithm had 71% accuracy (or 83% if allowed to guess five times). The algorithm doesn’t produce a deblurred image—it simply identifies what it sees in the obscured photo, based on information it already knows. The approach works with blurred and pixelated images, as well as P3, a type of JPEG encryption pitched as a secure way to hide information.

The essential business news, delivered fresh every morning.

Join 500,000+ readers who start their day with Quartz.

By subscribing, you agree to our Terms of Service and Privacy Policy.

“We’re using this off-the-shelf, poor man’s approach,” says Vitaly Shmatikov, co-author of the paper and professor at Cornell. “Just take a bunch of training data, throw some neural networks on it, throw standard image recognition algorithms on it, and even with this approach…we can obtain pretty good results.”

Shmatikov acknowledges that the Max Planck Institute’s work is more nuanced, taking into account contextual clues about identity. But he says that his simpler approach shows how weak these privacy methods really are. (He doesn’t mention that his method also is 18% more accurate in a comparable test.)

To build the attacks that identified faces in YouTube videos, researchers took publicly-available pictures and blurred the faces with YouTube’s video tool. They then fed the algorithm both sets of images, so it could learn how to correlate blur patterns to the unobscured faces. When given different images of the same people, the algorithm could determine their identity with 57% accuracy, or 85% percent when given five chances.

“It’s pretty simple stuff,” says Richard McPherson, co-author and a visiting student at Cornell Tech. “The only real restriction is having a data set you could train these machine learning techniques on. But that’s available.”

Training data could be as simple as images on Facebook or a staff directory on a website. For numbers and letters (even handwritten), the training data is publicly available online.

Companies like YouTube that recommend blurring should make it clear that their privacy measures only protect information from humans, not machines or determined adversaries, say McPherson and Shmatikov.

“In security and privacy, people do not fully appreciate the power of machine learning,” says Shmatikov. “Until somebody shows how even off-the-shelf technology can be used for privacy breaches, people in security and privacy aren’t going to realize it.”