GMSA: A Data Sharing System for Multiple Sequence Alignment Across Multiple Users-Reference-Cited by-同舟云学术

GMSA: A Data Sharing System for Multiple Sequence Alignment Across Multiple Users

Published:2019-07-16 Issue:6 Volume:14 Page:504-515
ISSN:1574-8936
Container-title:Current Bioinformatics
language:en
Short-container-title:CBIO

Author:

Bai Na¹,Tang Shanjiang¹,Yu Ce¹,Fu Hao¹,Wang Chen²,Chen Xi¹

Affiliation:

1. School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, P.O. Box 300350, China

2. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana- Champaign, Illinois, United States

Abstract

Background: In recent years, the rapid growth of biological datasets in Bioinformatics has made the computation of Multiple Sequence Alignment (MSA) become extremely slow. Using the GPU to accelerate MSA has shown to be an effective approach. Moreover, there is a trend that many bioinformatic researchers or institutes setup a shared server for remote users to submit MSA jobs via provided web-pages or tools. Objective: Given the fact that different MSA jobs submitted by users often process similar datasets, there can be an opportunity for users to share their computation results between each other, which can avoid the redundant computation and thereby reduce the overall computing time. Furthermore, in the heterogeneous CPU/GPU platform, many existing applications assign their computation on GPU devices only, which leads to a waste of the CPU resources. Co-run computation can increase the utilization of computing resources on both CPUs and GPUs by dispatching workloads onto them simultaneously. Methods: In this paper, we propose an efficient MSA system called GMSA for multi-users on shared heterogeneous CPU/GPU platforms. To accelerate the computation of jobs from multiple users, data sharing is considered in GMSA due to the fact that different MSA jobs often have a percentage of the same data and tasks. Additionally, we also propose a scheduling strategy based on the similarity in datasets or tasks between MSA jobs. Furthermore, co-run computation model is adopted to take full use of both CPUs and GPUs. Results: We use four protein datasets which were redesigned according to different similarity. We compare GMSA with ClustalW and CUDA-ClustalW in multiple users scenarios. Experiments results showed that GMSA can achieve a speedup of up to 32X. Conclusion: GMSA is a system designed for accelerating the computation of MSA jobs with shared input datasets on heterogeneous CPU/GPU platforms. In this system, a strategy was proposed and implemented to find the common datasets among jobs submitted by multiple users, and a scheduling algorithm is presented based on it. To utilize the overall resource of both CPU and GPU, GMSA employs the co-run computation model. Results showed that it can speed up the total computation of jobs efficiently.

Funder

NationalNatural Science Foundation of China

Tianjin Natural Science Foundation

Publisher

Bentham Science Publishers Ltd.

Subject

Computational Mathematics,Genetics,Molecular Biology,Biochemistry

Reference49 articles.

1. Karadimitriou K, Kraft DH.

2. https://www.ebi.ac.uk/services/

3. https://docs.nvidia.com/ cuda/cuda-c-programming-guide/index.html

4. Schmidt B. Bioinformatics: High Performance Parallel Computer Architectures.