Affiliation:
1. School of Electrical Automation and Information Engineering, Tianjin University, Tianjin 300072, China
Abstract
Polymerase Chain Reaction (PCR) amplification is widely used for retrieving information from DNA storage. During the PCR amplification process, nonspecific pairing between the 3’ end of the primer and the DNA sequence can cause cross-talk in the amplification reaction, leading to the generation of interfering sequences and reduced amplification accuracy. To address this issue, we propose an efficient coding algorithm for PCR amplification information retrieval (ECA-PCRAIR). This algorithm employs variable-length scanning and pruning optimization to construct a codebook that maximizes storage density while satisfying traditional biological constraints. Subsequently, a codeword search tree is constructed based on the primer library to optimize the codebook, and a variable-length interleaver is used for constraint detection and correction, thereby minimizing the likelihood of nonspecific pairing. Experimental results demonstrate that ECA-PCRAIR can reduce the probability of nonspecific pairing between the 3’ end of the primer and the DNA sequence to 2–25%, enhancing the robustness of the DNA sequences. Additionally, ECA-PCRAIR achieves a storage density of 2.14–3.67 bits per nucleotide (bits/nt), significantly improving storage capacity.
Funder
Tianjin Science and Technology Planning Project
Reference39 articles.
1. DNA storage: Research landscape and future prospects;Dong;Natl. Sci. Rev.,2020
2. Recalibrating global data center energy-use estimates;Masanet;Science,2020
3. Bar-Lev, D., Orr, I., Sabary, O., Etzion, T., and Yaakobi, E. (2021). Deep DNA storage: Scalable and robust DNA storage via coding theory and deep learning. arXiv.
4. Strategies for sample labelling and library preparation in DNA metabarcoding studies;Bohmann;Mol. Ecol. Resour.,2022
5. Emerging approaches to DNA data storage: Challenges and prospects;Doricchi;ACS Nano,2022