Automatic detection of anchor points for multiple sequence alignment-Reference-Cited by-同舟云学术

Automatic detection of anchor points for multiple sequence alignment

Published:2010-09-02 Issue:1 Volume:11 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Pitschi Florian,Devauchelle Claudine,Corel Eduardo

Abstract

Abstract Background Determining beforehand specific positions to align (anchor points) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. Multiple local similarities are be expected to be more adequate, as more biologically relevant. However, even good multiple local similarities can prove incompatible with the ordering of an alignment. Results We use a recently developed algorithm to detect multiple local similarities, which returns subsets of positions in the sequences sharing similar contexts of appearence. In this paper, we describe first how to get, with the help of this method, subsets of positions that could form partial columns in an alignment. We introduce next a graph-theoretic algorithm to detect (and remove) positions in the partial columns that are inconsistent with a multiple alignment. Partial columns can be used, for the time being, as guide only by a few MSA programs: ClustalW 2.0, DIALIGN 2 and T-Coffee. We perform tests on the effect of introducing these columns on the popular benchmark BAliBASE 3. Conclusions We show that the inclusion of our partial alignment columns, as anchor points, improve on the whole the accuracy of the aligner ClustalW on the benchmark BAliBASE 3.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-11-445.pdf

Reference22 articles.

1. Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology 1994, 28–36.

2. Smith RF, Smith TF: Pattern-Induced Multi-sequence Alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for comparitive protein modelling. Protein Engineering 1992, 5: 35–41. 10.1093/protein/5.1.35

3. Subramanian AR, Kaufmann M, Morgenstern B: DIALIGN-TX: greedy and progressive approaches for the segment-based multiple sequence alignment. Algorithms for Molecular Biology 2008, 3: 6. 10.1186/1748-7188-3-6

4. Notredame C, Higgins D, Heringa J: T-Coffee: a novel algorithm for multiple sequence alignment. J Mol Biol 2000, 302: 205–217. 10.1006/jmbi.2000.4042

5. Edgar R: MUSCLE: Multiple sequence alignment with high score accuracy and high throughput. Nuc Acids Res 2004, 32: 1792–1797. 10.1093/nar/gkh340

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Vector-clustering Multiple Sequence Alignment: Aligning into the twilight zone of protein sequence similarity with protein language models;2022-10-21

2. Conserved Sequence Analysis of Influenza A Virus HA Segment and Its Application in Rapid Typing;Diagnostics;2021-07-23

3. SNN-SB: Combining Partial Alignment Using Modified SNN Algorithm with Segment-Based for Multiple Sequence Alignments;Journal of Physics: Conference Series;2021-07-01

4. A greedy, graph-based algorithm for the alignment of multiple homologous gene lists;Bioinformatics;2011-01-06