Abstract
AbstractTandem Repeats (TRs) are segments that occur several times in a DNA sequence, and each copy is adjacent to other. In the last few years, TRs have gained significant attention as they are thought to be related with certain human diseases. Therefore, identifying and classifying TRs have become a highly important task in bioinformatics in order to analyze their disorders and relationships with illnesses. Dot2dot, a tool recently developed to find TRs, provides more accurate results than the previous state-of-the-art, but it requires a long execution time even when using multiple threads. This work presents MPI-dot2dot, a novel version of this tool that combines MPI and OpenMP so that it can be executed in a cluster of multicore nodes and thus reduces its execution time. The performance of this new parallel implementation has been tested using different real datasets. Depending on the characteristics of the input genomes, it is able to obtain the same biological results as Dot2dot but more than 100 times faster on a 16-node multicore cluster (384 cores). MPI-dot2dot is publicly available to download from https://sourceforge.net/projects/mpi-dot2dot.
Funder
Ministerio de Ciencia e Innovación
Xunta de Galicia
Universidade da Coruña
Publisher
Springer Science and Business Media LLC
Subject
Hardware and Architecture,Information Systems,Theoretical Computer Science,Software
Reference35 articles.
1. Message Passing Interface Forum. MPI: A Message-Passing Interface Standard Version 3.1 (2015). [Online] Available: http://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
2. Avvaru AK, Sowpati DT, Mishra RK (2018) PERF: an exhaustive algorithm for ultra-fast and efficient identification of microsatellites from large DNA sequences. Bioinformatics 34(6):943–948
3. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2012) GenBank. Nucleic Acids Research 41(D1):D36–D42
4. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27(2):573–580
5. Boeva V, Regnier M, Papatsenko D, Makeev V (2006) Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics 22(6):676–684
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献