Affiliation:
1. School of Mathematics, Tianjin University , Tianjin, 300372 , China
2. Center for Applied Mathematics, Tianjin University , Tianjin, 300372 , China
Abstract
Abstract
Deoxyribonucleic acid (DNA) is an attractive medium for long-term digital data storage due to its extremely high storage density, low maintenance cost and longevity. However, during the process of synthesis, amplification and sequencing of DNA sequences with homopolymers of large run-length, three different types of errors, namely, insertion, deletion and substitution errors frequently occur. Meanwhile, DNA sequences with large imbalances between GC and AT content exhibit high dropout rates and are prone to errors. These limitations severely hinder the widespread use of DNA-based data storage. In order to reduce and correct these errors in DNA storage, this paper proposes a novel coding schema called DNA-LC, which converts binary sequences into DNA base sequences that satisfy both the GC balance and run-length constraints. Furthermore, our coding mode is able to detect and correct multiple errors with a higher error correction capability than the other methods targeting single error correction within a single strand. The decoding algorithm has been implemented in practice. Simulation results indicate that our proposed coding scheme can offer outstanding error protection to DNA sequences. The source code is freely accessible at https://github.com/XiayangLi2301/DNA.
Funder
National Key Research and Development Program of China
National Natural Science Foundation of China
Publisher
Oxford University Press (OUP)
Subject
Molecular Biology,Information Systems
Reference30 articles.
1. Upper and lower bounds on the capacity of the dna-based storage channel;Yan;IEEE Communications Letters,2022
2. Clover: tree structure-based efficient DNA clustering for DNA-based data storage;Brief Bioinform,2022
3. Dna-based storage: Trends and methods. IEEE Transactions on Molecular, Biological and Multi-Scale;Communications,2015
4. A characterization of the DNA data storage channel;Heckel;Sci Rep,2019
5. Characterizing and measuring bias in sequence data;Ross;Genome Biol,2013
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献