Affiliation:
1. Department of Electrical Engineering and Information Technology, Iranian Research Organization for Science and Technology, Tehran, Iran
2. School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
Abstract
Background:
One of the pivotal challenges in nowadays genomic research domain is the fast
processing of voluminous data such as the ones engendered by high-throughput Next-Generation
Sequencing technologies. On the other hand, BLAST (Basic Local Alignment Search Tool), a longestablished
and renowned tool in Bioinformatics, has shown to be incredibly slow in this regard.
Objective:
To improve the performance of BLAST in the processing of voluminous data, we have
applied a novel memory-aware technique to BLAST for faster parallel processing of voluminous data.
Method:
We have used a master-worker model for the processing of voluminous data alongside a
memory-aware technique in which the master partitions the whole data in equal chunks, one chunk for
each worker, and consequently each worker further splits and formats its allocated data chunk according
to the size of its memory. Each worker searches every split data one-by-one through a list of queries.
Results:
We have chosen a list of queries with different lengths to run insensitive searches in a huge
database called UniProtKB/TrEMBL. Our experiments show 20 percent improvement in performance
when workers used our proposed memory-aware technique compared to when they were not memory
aware. Comparatively, experiments show even higher performance improvement, approximately 50
percent, when we applied our memory-aware technique to mpiBLAST.
Conclusion:
We have shown that memory-awareness in formatting bulky database, when running
BLAST, can improve performance significantly, while preventing unexpected crashes in low-memory
environments. Even though distributed computing attempts to mitigate search time by partitioning and
distributing database portions, our memory-aware technique alleviates negative effects of page-faults on
performance.
Publisher
Bentham Science Publishers Ltd.
Subject
Computational Mathematics,Genetics,Molecular Biology,Biochemistry
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献