Author:
Sabary Omer,Yucovich Alexander,Shapira Guy,Yaakobi Eitan
Abstract
AbstractMotivated by DNA storage systems, this work presents the DNA reconstruction problem, in which a length-n string, is passing through the DNA-storage channel, which introduces deletion, insertion and substitution errors. This channel generates multiple noisy copies of the transmitted string which are called traces. A DNA reconstruction algorithm is a mapping which receives t traces as an input and produces an estimation of the original string. The goal in the DNA reconstruction problem is to minimize the edit distance between the original string and the algorithm’s estimation. In this work, we present several new algorithms for this problem. Our algorithms look globally on the entire sequence of the traces and use dynamic programming algorithms, which are used for the shortest common supersequence and the longest common subsequence problems, in order to decode the original string. Our algorithms do not require any limitations on the input and the number of traces, and more than that, they perform well even for error probabilities as high as 0.27. The algorithms have been tested on simulated data, on data from previous DNA storage experiments, and on a new synthesized dataset, and are shown to outperform previous algorithms in reconstruction accuracy.
Funder
European Union
Israel Innovation Authority
Publisher
Springer Science and Business Media LLC
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. DNA Storage Toolkit: A Modular End-to-End DNA Data Storage Codec and Simulator;2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS);2024-05-05
2. GradHC: highly reliable gradual hash-based clustering for DNA storage systems;Bioinformatics;2024-04-22
3. An Instance-Based Approach to the Trace Reconstruction Problem;2024 58th Annual Conference on Information Sciences and Systems (CISS);2024-03-13