Abstract
AbstractThis study introduces a novel model for analyzing and determining the required sequencing coverage in DNA-based data storage, focusing on combinatorial DNA encoding. We explore the application of the coupon collector model for combinatorial-letter reconstruction, post-sequencing, which ensure efficient data retrieval and error reduction. We use a Markov Chain model to compute the probability of error-free reconstruction. We develop theoretical bounds on the decoding probability and use empirical simulations to validate these bounds. The work contributes to the understanding of sequencing coverage in DNA-based data storage, offering insights into decoding complexity, error correction, and sequence reconstruction. We provide a Python package that takes the code design and other message parameters as input, and then computes the required read coverage to guarantee reconstruction at a given desired confidence.
Publisher
Cold Spring Harbor Laboratory
Reference27 articles.
1. J. Rydning , “Worldwide IDC Global DataSphere Forecast, 2022–2026: Enterprise Organizations Driving Most of the Data Growth,” International Data Corporation (IDC), 2022.
2. L. Anavy , I. Vaknin , O. Atar , R. Amit and Z. Yakhini , “Data storage in DNA with fewer synthesis cycles using composite DNA letters,” Nature Biotechnology, vol. 37, no. 1237, 2019.
3. “DNA fountain enables a robust and efficient storage architecture;Science,2017
4. “Random access in large-scale DNA data storage;Nature Biotechnology,2018
5. S. Yazdi , R. Gabrys and O. Milenkovic , “Portable and error-free DNA-based data storage,” Scientific Reports, vol. 7, no. 5011, 2017.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献