Abstract
SummaryDNA has been pursued as a compelling medium for digital data storage during the past decade. While large-scale data storage and random access have been achieved in artificial DNA, the synthesis cost keeps hindering DNA data storage from popularizing into daily life. In this study, we proposed a more efficient paradigm for digital data compressing to DNA, while excluding arbitrary sequence constraints. Both standalone neural networks and pre-trained language models were used to extract the intrinsic patterns of data, and generated probabilistic portrayal, which was then transformed into constraint-free nucleotide sequences with a hierarchical finite state machine. Utilizing these methods, a 12%-26% improvement of compression ratio was realized for various data, which directly translated to up to 26% reduction in DNA synthesis cost. Combined with the progress in DNA synthesis, our methods are expected to facilitate the realization of practical DNA data storage.
Publisher
Cold Spring Harbor Laboratory
Reference41 articles.
1. Bohannon J . DNA: the ultimate hard drive. Science News (16 August 2012).
2. Data storage in DNA;Int J Electr Energy,2014
3. How DNA could store all the world’s data;Nature,2016
4. Nucleic acid memory;NatMater,2016
5. Next-Generation Digital Information Storage in DNA