Kalign – an accurate and fast multiple sequence alignment algorithm-Reference-Cited by-同舟云学术

Kalign – an accurate and fast multiple sequence alignment algorithm

Published:2005-12 Issue:1 Volume:6 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Lassmann Timo,Sonnhammer Erik LL

Abstract

Abstract Background The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics. Results We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods. Conclusion Kalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-6-298.pdf

Reference37 articles.

1. Notredame C: Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 2002, 3: 131–144.

2. Felsenstein J: PHYLIP – Phylogeny Inference Package (Version 3.2). Cladistics 1989, 5: 164–166.

3. Sjolander K: Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics 2004, 20(2):170–179.

4. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer ELL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res 2004, (32 Database):138–141.

5. Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48(3):443–453.

Cited by 569 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SARS-CoV-2 Genotyping Highlights the Challenges in Spike Protein Drift Independent of Other Essential Proteins;Microorganisms;2024-09-09

2. Several secondary metabolite gene clusters in the genomes of ten Penicillium spp. raise the risk of multiple mycotoxin occurrence in chestnuts;Food Microbiology;2024-09

3. Intra-genomic genes-to-genes correlation enables genome representation;2024-06-14

4. Barcoding of Asteraceae plants of juniper ecosystem Ziarat, Balochistan;Pakistan Journal of Botany;2024-04-25

5. Performance Analysis of Multiple Sequence Alignment Tools;Proceedings of the 2024 ACM Southeast Conference on ZZZ;2024-04-18