BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs-Reference-Cited by-同舟云学术

BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs

Published:2020-03 Issue:S2 Volume:21 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Fostier Jan

Abstract

Abstract Background The identification of all matches of a large set of position weight matrices (PWMs) in long DNA sequences requires significant computational resources for which a number of efficient yet complex algorithms have been proposed. Results We propose BLAMM, a simple and efficient tool inspired by high performance computing techniques. The workload is expressed in terms of matrix-matrix products that are evaluated with high efficiency using optimized BLAS library implementations. The algorithm is easy to parallelize and implement on CPUs and GPUs and has a runtime that is independent of the selected p-value. In terms of single-core performance, it is competitive with state-of-the-art software for PWM matching while being much more efficient when using multithreading. Additionally, BLAMM requires negligible memory. For example, both strands of the entire human genome can be scanned for 1404 PWMs in the JASPAR database in 13 min with a p-value of 10−4 using a 36-core machine. On a dual GPU system, the same task can be performed in under 5 min. Conclusions BLAMM is an efficient tool for identifying PWM matches in large DNA sequences. Its C++ source code is available under the GNU General Public License Version 3 at https://github.com/biointec/blamm.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

http://link.springer.com/content/pdf/10.1186/s12859-020-3348-6.pdf

Reference32 articles.

1. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000; 16(1):16–23. https://doi.org/10.1093/bioinformatics/16.1.16.

2. Khan A, Fornes O, Stigliani A, Gheorghe M, Castro-Mondragon JA, van der Lee R, Bessy A, Chèneby J, Kulkarni SR, Tan G, Baranasic D, Arenillas DJ, Sandelin A, Vandepoele K, Lenhard B, Ballester B, Wasserman WW, Parcy F, Mathelier A. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 2018; 46(D1):260–6. https://doi.org/10.1093/nar/gkx1126.

3. Newburger DE, Bulyk ML. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res. 2009; 37(Database issue):77–82. https://doi.org/10.1093/nar/gkn660.

4. Wingender E, Dietze P, Karas H, Knüppel R. TRANSFAC: A database on transcription factors and their DNA binding sites. Nucleic Acids Res. 1996; 24(1):238–41. https://doi.org/10.1093/nar/24.1.238.

5. Nishida K, Frith MC, Nakai K. Pseudocounts for transcription factor binding sites. Nucleic Acids Res. 2009; 37(3):939–44. https://doi.org/10.1093/nar/gkn1019.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deep learning the cis-regulatory code for gene expression in selected model plants;Nature Communications;2024-04-25

2. Quantum Algorithm for Position Weight Matrix Matching;IEEE Transactions on Quantum Engineering;2023

3. Searching Pattern in DNA Sequence Using ECC-Diffie-Hellman Exchange Based Hash Function: An Efficient Approach;Machine Learning and Big Data Analytics (Proceedings of International Conference on Machine Learning and Big Data Analytics (ICMLBDA) 2021);2021-09-30

4. Main findings and advances in bioinformatics and biomedical engineering- IWBBIO 2018;BMC Bioinformatics;2020-05