We have seen the future and it is DNA storage!

The Hutch Report

Technology advancements in the last 10 years alone have made our world more connected than it has ever been, providing people a simpler and faster means of documenting and sharing memories. Millions of people are taking pictures, recording movies or producing reports and messages on a daily basis. However, our digitally connected world is now creating information at an unprecedented rate. Each year roughly 16 zettabytes are being produced (one zettabyte = one billion terabytes). The research group IDC estimated that by 2025 we will be producing over 160 zettabytes a year.

Although all this data may be seen as a treasure trove for researchers, advertisers or data analysts, we are finding that current storage technologies are not able to keep up. This torrent of information may soon outstrip the ability of hard drives to capture it. Since we’re not going to stop taking pictures and recording movies, we need to develop new ways to store them.

Our daily production of photos, documents, messages and movies are not the only sources of data. Advancements in the world of biotechnology and genomics in particular promise to be producing vast amounts of data.

It has been 18 years since the first draft of the human genome sequence in 2000. However, the draft human genome sequence was merely a first step. A deeper understanding requires many more sequenced genomes, as well as cheaper and faster sequencing methods. In order to achieve this we need vast amounts of computing power and storage. According to a report published in the journal PLoS Biology, it is estimated that by 2025, between 100 million and 2 billion human genomes could have been sequenced. If we add the errors incurred in sequencing and preliminary analysis, the number of data that must be stored for a single genome become 30 times larger than the size of the genome itself. The data-storage demands for this alone are estimated to be as much as 2-40 exabytes (1 exabyte is 1018 bytes). Biologists and computer scientists are now worried that their discipline is not geared up to cope with the coming genomics data flood.

Curiously the problem of the masses of data that can be extracted from the human genome may in fact provide a solution for storage needs. In the 1970s Frederick Sanger of the Medical Research Council’s Laboratory of Molecular Biology and his colleagues published a paper on a particular genome and indicated that it may contain a message from aliens. The thesis was not taken very seriously but the possibility was enough to intrigue many scientists, in particular one Harvard biologist named George Church. Church began to wonder if one could encode messages into biological DNA.

Along with two Harvard colleagues, George Church translated an HTML draft of a 50,000-word book on synthetic biology into binary code and converted it to a DNA sequence. DNA molecules are long sequences of smaller molecules, called nucleotides — adenine, cytosine, thymine and guanine, usually designated as A, C, T and G. Rather than creating sequences of 0s and 1s, as in electronic media, DNA storage uses sequences of the nucleotides. Church and his team coded 0s as A or C and 1s as G or T—and “wrote” this sequence with an ink-jet DNA printer onto a microchip as a series of DNA fragments.

To store a picture, for example, you would start with its encoding as a digital file, like a JPEG. That file is, in essence, a long string of 0s and 1s. Imagine the first eight bits of the file are 01111000; they are broken into pairs – 01 11 10 00 – which correspond to C-G-T-A. That’s the order in which you join the nucleotides to form a DNA strand.

Church and his team were successful in encoding around 650kb of data and retrieving it, which led the team to predict a storage potential for their method of more than 700 terabytes per cubic millimetre.  This was by far the largest volume of data ever artificially encoded in DNA. It illustrated a data density for DNA that was several orders of magnitude greater than that of state-of-the-art storage media. It is believed that a single gram could hold roughly a zettabyte of data. A few kilograms of DNA could theoretically store all of humanity’s data.

There are still numerous challenges to overcome, such as storing (the act of storing data in DNA is a lot easier than getting it back out), proper retrieval and archiving. DNA is slow and expensive to make as it requires pinpoint precision to ensure every single molecule is coded accurately. So, at the moment, mass production is not an option, however as DNA synthesis continues to improve, scientists believe that it can one day become a realistic permanent storage device for all our data.