WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity-Reference-Cited by-同舟云学术

WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity

Published:2023-01-01 Issue:1 Volume:3 Page:
ISSN:2635-0041
Container-title:Bioinformatics Advances
language:en
Short-container-title:

Author:

Liu Baqiao¹^ORCID,Warnow Tandy¹^ORCID

Affiliation:

1. Department of Computer Science, University of Illinois Urbana-Champaign , Champaign, IL 61820, USA

Abstract

AbstractSummaryMultiple sequence alignment is a basic part of many bioinformatics pipelines, including in phylogeny estimation, prediction of structure for both RNAs and proteins, and metagenomic sequence analysis. Yet many sequence datasets exhibit substantial sequence length heterogeneity, both because of large insertions and deletions in the evolutionary history of the sequences and the inclusion of unassembled reads or incompletely assembled sequences in the input. A few methods have been developed that can be highly accurate in aligning datasets with sequence length heterogeneity, with UPP one of the first methods to achieve good accuracy, and WITCH a recent improvement on UPP for accuracy. In this article, we show how we can speed up WITCH. Our improvement includes replacing a critical step in WITCH (currently performed using a heuristic search) by a polynomial time exact algorithm using Smith–Waterman. Our new method, WITCH-NG (i.e. ‘next generation WITCH’) achieves the same accuracy but is substantially faster. WITCH-NG is available at https://github.com/RuneBlaze/WITCH-NG.Availability and implementationThe datasets used in this study are from prior publications and are freely available in public repositories, as indicated in the Supplementary Materials.Supplementary informationSupplementary data are available at Bioinformatics Advances online.

Funder

National Science Foundation

Publisher

Oxford University Press (OUP)

Subject

Cell Biology,Developmental Biology,Embryology,Anatomy

Link

https://academic.oup.com/bioinformaticsadvances/advance-article-pdf/doi/10.1093/bioadv/vbad024/49434486/vbad024.pdf

Reference28 articles.

1. Aligning short reads to reference alignments and trees;Berger;Bioinformatics,2011

2. The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs;Cannone;BMC Bioinformatics,2002

3. Metagenomic analysis using phylogenetic placement—a review of the first decade. Computational methods for microbiome analysis;Czech;Front. Bioinform,2022

4. Accelerated profile HMM searches;Eddy;PLoS Comput. Biol.,2011

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimizing Data Parallelism for FM-Based Short-Read Alignment on the Heterogeneous Non-Uniform Memory Access Architectures;Future Internet;2024-06-19

2. EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment;Algorithms for Molecular Biology;2023-12-07

3. EMMA: A New Method for Computing Multiple Sequence Alignments given a Constraint Subset Alignment;2023-06-12