Scaling up DNA data storage and random access retrieval-Reference-Cited by-同舟云学术

Scaling up DNA data storage and random access retrieval

Published:2017-03-07 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Organick Lee,Ang Siena Dumas,Chen Yuan-Jyue,Lopez Randolph,Yekhanin Sergey,Makarychev Konstantin,Racz Miklos Z.,Kamath Govinda,Gopalan Parikshit,Nguyen Bichlien,Takahashi Christopher,Newman Sharon,Parker Hsing-Yeh,Rashtchian Cyrus,Stewart Kendall,Gupta Gagan,Carlson Robert,Mulligan John,Carmean Douglas,Seelig Georg,Ceze Luis,Strauss Karin

Abstract

Current storage technologies can no longer keep pace with exponentially growing amounts of data. 1 Synthetic DNA offers an attractive alternative due to its potential information density of ~ 1018 B/mm3, 107 times denser than magnetic tape, and potential durability of thousands of years.2 Recent advances in DNA data storage have highlighted technical challenges, in particular, coding and random access, but have stored only modest amounts of data in synthetic DNA. 3,4,5 This paper demonstrates an end-to-end approach toward the viability of DNA data storage with large-scale random access. We encoded and stored 35 distinct files, totaling 200MB of data, in more than 13 million DNA oligonucleotides (about 2 billion nucleotides in total) and fully recovered the data with no bit errors, representing an advance of almost an order of magnitude compared to prior work. 6 Our data curation focused on technologically advanced data types and historical relevance, including the Universal Declaration of Human Rights in over 100 languages,7 a high-definition music video of the band OK Go,8 and a CropTrust database of the seeds stored in the Svalbard Global Seed Vault.9 We developed a random access methodology based on selective amplification, for which we designed and validated a large library of primers, and successfully retrieved arbitrarily chosen items from a subset of our pool containing 10.3 million DNA sequences. Moreover, we developed a novel coding scheme that dramatically reduces the physical redundancy (sequencing read coverage) required for error-free decoding to a median of 5x, while maintaining levels of logical redundancy comparable to the best prior codes. We further stress-tested our coding approach by successfully decoding a file using the more error-prone nanopore-based sequencing. We provide a detailed analysis of errors in the process of writing, storing, and reading data from synthetic DNA at a large scale, which helps characterize DNA as a storage medium and justify our coding approach. Thus, we have demonstrated a significant improvement in data volume, random access, and encoding/decoding schemes that contribute to a whole-system vision for DNA data storage.

Publisher

Cold Spring Harbor Laboratory

Reference16 articles.

1. IDC, Where in the World is Storage, Available at http://www.idc.com/downloads/where_is_storage_infographic_24338.pdf (2013).

2. Grass, R. , Heckel, R. , Puddu, M. , Paunescu, D. & Stark, W. J. , Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes. Angewandte Chemie International Edition 54 (8) (2015).

3. Church, G. M. , Gao, Y. & Kosuri, S. , Next-Generation Digital Information Storage in DNA. Science (2012).

4. Bornholt, J. et al., A DNA-based Archival Storage System. Proceedings of the Internationl Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2016).

5. Goldman, N. et al., Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature (2013).

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integrating FPGA Acceleration in the DNAssim Framework for Faster DNA-Based Data Storage Simulations;Electronics;2023-06-10

2. Kernel code for DNA digital data storage;International Journal of Bio-Inspired Computation;2023

3. Data and image storage on synthetic DNA: existing solutions and challenges;EURASIP Journal on Image and Video Processing;2022-10-29

4. Predicting the Occurrence of Variants in RAG1 and RAG2;Journal of Clinical Immunology;2019-08-06

5. Next Steps for Access to Safe, Secure DNA Synthesis;Frontiers in Bioengineering and Biotechnology;2019-04-24