We need to own up to the fact that we’ve become digital hoarders running out of room for our data.
In 2016, humans collectively generated 16.1 trillion gigabytes of digital information; that annual number is expected to increase by more than tenfold by 2025 (pdf, p. 3). Our personal pictures, texts, and emails are just a drop in the bucket; the real deluge comes from the scientists creating vast swaths of information as they run experiments and clinical trials looking deep into the smallest components of biology, and observe other planets, looking as deeply into the vast universe as is possible. And the places we currently put that data—external hard drives and cloud server rooms, for the most part—aren’t a perfect solution. They take up a lot of space, and need an upgrade every decade or so.
Biotech startups are looking within to solve the problem. Specifically, within our cells.
The latest trend in Big Storage is encoding data in DNA. Genetic material is, after all, already a coding technology. It codes for life: every human cell contains 3 billion base pairs, the partnered nucleotides that are either adenine (A) paired with thymine (T), or guanine (G) paired with cytosine (C). Their order serves as a recipe for all the proteins that carry out the functions we (and all living things) need to live.
Several institutions—including the Defense Advanced Research Projects Agency, the US military’s research arm—have already developed DNA-based storage systems that can encode all sorts of information into the tiny, stable chain of molecule that can last for thousands of years. Some estimates suggest that, encoded in DNA, all the data in the world could be driven around in the back of a car.
There’s just one problem: making the unique DNA that codes for information is expensive. As Wired reports, it can cost around $100,000 to print the 1,500,000 or so base pairs currently needed to record a minute of stereo sound.
Catalog, a Boston-based company started by former Massachusetts Institute of Technology researchers, is trying to reduce those costs. Rather than filling one long strand of DNA with information, Catalog makes snippets of DNA fragments, 20 to 30 base pairs-long, that can be sewn together using enzymes. The arrangement of these snippets is what determines their meaning. Essentially, it’s like a language: in English, there are only 26 letters, but through various arrangements we can make, theoretically, make an infinite number of different words. Catalog estimates that it will cost less than three thousands of a cent to store one MB of data. For context, on Spotify, a minute of stereo sound is about 2.4 MB at the highest quality.
On June 26, Catalog announced it has already used this system to store the novel A Hitchhiker’s Guide to the Galaxy by Douglas Adams and the poem “The Road Not Taken” by Robert Frost in genetic material. The company also says it has received $9 million of backing from various venture capital firms. It plans to make its technology commercially available next year.
Any company interested just has to submit its desired data to Catalog, which will then convert that information into binary code in long chains of 0’s and 1’s. Next, the company’s process assigns a pair of these 0’s and 1’s to each nucleotide; for example, “A” could be 00, “C” could be 01, “T” could be 11, and “G” could be 10. The data are then converted, using this DNA code, into a tiny vial of genetic material that can be stored safely in any refrigerator that reaches 4°C (39.2°F), like those used in restaurants.
When information is stored in this way, you need to have this key in order to decode the DNA back into binary code, and then back into a legible format. Hyunjun Park, one of Catalog’s founders, says as soon as the company start encoding information, they’ll make their key public so customers can have their information resequenced by anyone. However, companies could request a new, private key so that their information has an extra layer of encryption.
If Catalog’s process works as it says it will, the company could pave the way to make genetic data storage accessible for everyone, for centuries. Park says that the company is working with archivists to figure out the best way to include a legible, everlasting key in the DNA code itself so that our great-great-grandkids (or alien invaders), can decode Catalog DNA on their own in the future.