Affiliation:
1. School of Computer Science, University of Science and Technology of China and Key Laboratory on High Performance Computing of Anhui, China
2. Institute of basic and Frontier Sciences, University of Electronic Science and Technology of China and Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Chengdu, Sichuan, China
Abstract
Abstract
Multiple sequence alignment (MSA) is fundamental to many biological applications. But most classical MSA algorithms are difficult to handle large-scale multiple sequences, especially long sequences. Therefore, some recent aligners adopt an efficient divide-and-conquer strategy to divide long sequences into several short sub-sequences. Selecting the common segments (i.e. anchors) for division of sequences is very critical as it directly affects the accuracy and time cost. So, we proposed a novel algorithm, FMAlign, to improve the performance of multiple nucleotide sequence alignment. We use FM-index to extract long common segments at a low cost rather than using a space-consuming hash table. Moreover, after finding the longer optimal common segments, the sequences are divided by the longer common segments. FMAlign has been tested on virus and bacteria genome and human mitochondrial genome datasets, and compared with existing MSA methods such as MAFFT, HAlign and FAME. The experiments show that our method outperforms the existing methods in terms of running time, and has a high accuracy on long sequence sets. All the results demonstrate that our method is applicable to the large-scale nucleotide sequences in terms of sequence length and sequence number. The source code and related data are accessible in https://github.com/iliuh/FMAlign.
Funder
National Nature Science Foundation of China
Fund for Foreign Scholars in University Research and Teaching Programs
National Natural Science Foundation of China
Publisher
Oxford University Press (OUP)
Subject
Molecular Biology,Information Systems
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献