Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers-Reference-Cited by-同舟云学术

Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers

Published:2019-12 Issue:S16 Volume:20 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Ito Satoshi,Yadome Masaaki,Nishiki Tatsuo,Ishiduki Shigeru,Inoue Hikaru,Yamaguchi Rui,Miyano Satoru

Abstract

Abstract Background Supercomputers have become indispensable infrastructures in science and industries. In particular, most state-of-the-art scientific results utilize massively parallel supercomputers ranked in TOP500. However, their use is still limited in the bioinformatics field due to the fundamental fact that the asynchronous parallel processing service of Grid Engine is not provided on them. To encourage the use of massively parallel supercomputers in bioinformatics, we developed middleware called Virtual Grid Engine, which enables software pipelines to automatically perform their tasks as MPI programs. Result We conducted basic tests to check the time required to assign jobs to workers by VGE. The results showed that the overhead of the employed algorithm was 246 microseconds and our software can manage thousands of jobs smoothly on the K computer. We also tried a practical test in the bioinformatics field. This test included two tasks, the split and BWA alignment of input FASTQ data. 25,055 nodes (2,000,440 cores) were used for this calculation and accomplished it in three hours. Conclusion We considered that there were four important requirements for this kind of software, non-privilege server program, multiple job handling, dependency control, and usability. We carefully designed and checked all requirements. And this software fulfilled all the requirements and achieved good performance in a large scale analysis.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

http://link.springer.com/content/pdf/10.1186/s12859-019-3085-x.pdf

Reference22 articles.

1. The Cost of Sequencing a Human Genome. 2018. https://www.genome.gov/sequencingcosts/. Accessed 25 Aug 2018.

2. Zuk O, Schaffner SF, Samocha K, Do R, Hechter E, Kathiresan S, Daly MJ, Neale BM, Sunyaev SR, Lander ES. Searching for missing heritability: Designing rare variant association studies. Proc Nat Acad Sci. 2014; 111(4):455–64. https://doi.org/10.1073/pnas.1322563111. http://arxiv.org/abs/https://www.pnas.org/content/111/4/E455.full.pdf.

3. TOP, 500 Project. 2018. https://www.top500.org/.. Accessed 25 Aug 2018.

4. McNally EM, Dorn I, Gerald W, Foster IT, Golbus JR, Dellefave-Castillo L, Pesce LL, Puckelwartz MJ, Day SM, Cappola TP, Nelakuditi V. Supercomputing for the parallelization of whole genome analysis. Bioinformatics. 2014; 30(11):1508–13. doi:10.1093/bioinformatics/btu071. http://oup.prod.sis.lan/bioinformatics/article-pdf/30/11/1508/794534/btu071.pdf.

5. Ito S, Shiraishi Y, Shimamura T, Chiba K, Miyano S. High performance computing of a fusion gene detection pipeline on the k computer. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM): 2015. p. 1441–7. https://doi.org/10.1109/BIBM.2015.7359888.