ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

Author:

Firtina Can1,Pillai Kamlesh2,Kalsi Gurpreet S.2,Suresh Bharathwaj2,Cali Damla Senol3,Kim Jeremie S.1,Shahroodi Taha4,Cavlak Meryem Banu1,Lindegger Joël1,Alser Mohammed1,Luna Juan Gómez1,Subramoney Sreenivas2,Mutlu Onur1

Affiliation:

1. ETH Zurich, Switzerland

2. Intel Labs, USA

3. Carnegie Mellon University, USA

4. TU Delft, Netherlands

Abstract

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures, where states and edges capture modifications (i.e., insertions, deletions, and substitutions) by assigning probabilities to them. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. Accurate computation of these probabilities is essential for the correct identification of sequence similarities. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. When we analyze state-of-the-art works, we identify an urgent need for a flexible, high-performance, and energy-efficient hardware-software co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs. We introduce ApHMM , the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM employs hardware-software co-design to tackle the major inefficiencies in the Baum-Welch algorithm by 1) designing flexible hardware to accommodate various pHMM designs, 2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, 3) rapidly filtering out unnecessary computations using a hardware-based filter, and 4) minimizing redundant computations. ApHMM achieves substantial speedups of 15.55 ×  - 260.03 ×, 1.83 ×  - 5.34 ×, and 27.97 × when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29 ×  - 59.94 ×, 1.03 ×  - 1.75 ×, and 1.03 ×  - 1.95 ×, respectively, while improving their energy efficiency by 64.24 ×  - 115.46 ×, 1.75 ×, 1.96 ×.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Reference147 articles.

1. Irfan Ahmad Sabri A. Mahmoud and Gernot A. Fink. 2016. Open-vocabulary recognition of machine-printed Arabic text using hidden Markov models. Pattern Recognit. (2016). Irfan Ahmad Sabri A. Mahmoud and Gernot A. Fink. 2016. Open-vocabulary recognition of machine-printed Arabic text using hidden Markov models. Pattern Recognit. (2016).

2. Muhammad Ali Monem Hamid Jacob Jasser Joachim Lerman Samod Shetty and Fabio Di Troia. 2022. Profile Hidden Markov Model Malware Detection and API Call Obfuscation. In ICISSP. Muhammad Ali Monem Hamid Jacob Jasser Joachim Lerman Samod Shetty and Fabio Di Troia. 2022. Profile Hidden Markov Model Malware Detection and API Call Obfuscation. In ICISSP.

3. Mohammed Alser , Zulal Bingöl , Damla Senol Cali , Jeremie Kim , Saugata Ghose , Can Alkan , and Onur Mutlu . 2020. Accelerating Genome Analysis: A Primer on an Ongoing Journey . IEEE Micro ( 2020 ). Mohammed Alser, Zulal Bingöl, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, and Onur Mutlu. 2020. Accelerating Genome Analysis: A Primer on an Ongoing Journey. IEEE Micro (2020).

4. Mohammed Alser Hasan Hassan Akash Kumar Onur Mutlu and Can Alkan. 2019. Shouji: a fast and efficient pre-alignment filter for sequence alignment. Bioinform. (2019). Mohammed Alser Hasan Hassan Akash Kumar Onur Mutlu and Can Alkan. 2019. Shouji: a fast and efficient pre-alignment filter for sequence alignment. Bioinform. (2019).

5. Mohammed Alser Hasan Hassan Hongyi Xin Oğuz Ergin Onur Mutlu and Can Alkan. 2017. GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping. Bioinform. (2017). Mohammed Alser Hasan Hassan Hongyi Xin Oğuz Ergin Onur Mutlu and Can Alkan. 2017. GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping. Bioinform. (2017).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3