Affiliation:
1. Department of Computer Science, University of Helsinki , Helsinki 00560, Finland
2. School of Computing, Montana State University , Bozeman, MT 59717, United States
Abstract
Abstract
Motivation
Many important problems in Bioinformatics (e.g. assembly or multiassembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding “safe” partial solutions (e.g. contigs) which are common to all solutions. Previous research on safety has focused on polynomially time solvable problems, whereas many successful and natural models are NP-hard to solve, leaving a lack of “safety tools” for such problems. We propose the first method for computing all safe solutions for an NP-hard problem, “minimum flow decomposition” (MFD). We obtain our results by developing a “safety test” for paths based on a general integer linear programming (ILP) formulation. Moreover, we provide implementations with practical optimizations aimed to reduce the total ILP time, the most efficient of these being based on a recursive group-testing procedure.
Results
Experimental results on transcriptome datasets show that all safe paths for MFDs correctly recover up to 90% of the full RNA transcripts, which is at least 25% more than previously known safe paths. Moreover, despite the NP-hardness of the problem, we can report all safe paths for 99.8% of the over 27 000 non-trivial graphs of this dataset in only 1.5 h. Our results suggest that, on perfect data, there is less ambiguity than thought in the notoriously hard RNA assembly problem.
Availability and implementation
https://github.com/algbio/mfd-safety.
Funder
European Research Council
European Union’s Horizon 2020 research and innovation programme
Academy of Finland
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Width Helps and Hinders Splitting Flows;ACM Transactions on Algorithms;2024-03-13