Author:
Yu Changyong,Lin Pengxi,Zhao Yuhai,Ren Tianmei,Wang Guoren
Abstract
AbstractBackgroundIn various fields, searching for the Longest Common Subsequences (LCS) of Multiple (i.e., three or more) sequences (MLCS) is a classic but difficult problem to solve. The primary bottleneck in this problem is that present state-of-the-art algorithms require the construction of a huge graph (called a direct acyclic graph, or DAG), which the computer usually has not enough space to handle. Because of their massive time and space consumption, present algorithms are inapplicable to issues with lengthy and large-scale sequences.ResultsA mini Directed Acyclic Graph (mini-DAG) model and a novel Path Elimination Algorithm are proposed to address large-scale MLCS issues efficiently. In mini-DAG, we employ the branch and bound approach to reduce paths during DAG construction, resulting in a very mini DAG (mini-DAG), which saves memory space and search time.ConclusionEmpirical experiments have been performed on a standard benchmark set of DNA sequences. The experimental results show that our model outperforms the leading algorithms, especially for large-scale MLCS problems.
Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology