Fast multiple sequence alignment via multi-armed bandits-Reference-Cited by-同舟云学术

Fast multiple sequence alignment via multi-armed bandits

Published:2024-06-28 Issue:Supplement_1 Volume:40 Page:i328-i336
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Mazooji Kayvon¹,Shomorony Ilan¹

Affiliation:

1. Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign , Urbana, IL 61801, United States

Abstract

Abstract Summary Multiple sequence alignment is an important problem in computational biology with applications that include phylogeny and the detection of remote homology between protein sequences. UPP is a popular software package that constructs accurate multiple sequence alignments for large datasets based on ensembles of hidden Markov models (HMMs). A computational bottleneck for this method is a sequence-to-HMM assignment step, which relies on the precise computation of probability scores on the HMMs. In this work, we show that we can speed up this assignment step significantly by replacing these HMM probability scores with alternative scores that can be efficiently estimated. Our proposed approach utilizes a multi-armed bandit algorithm to adaptively and efficiently compute estimates of these scores. This allows us to achieve similar alignment accuracy as UPP with a significant reduction in computation time, particularly for datasets with long sequences. Availability and implementation The code used to produce the results in this paper is available on GitHub at: https://github.com/ilanshom/adaptiveMSA.

Funder

National Science Foundation

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/bioinformatics/article-pdf/40/Supplement_1/i328/58354964/btae225.pdf

Reference36 articles.

1. Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction;Antkowiak;Nat Commun,2020

2. Bandit-based monte carlo optimization for nearest neighbors;Bagaria;IEEE J Sel Areas Inf Theory,2021

3. Ultra fast medoid identification via correlated sequential halving;Baharav;Adv Neural Inf Process Syst,2019

4. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing;Berlin;Nat Biotechnol,2015