Author:
Guo Jiannan,Gao Jing,Liu Zhenyu
Abstract
Abstract
Sequence alignment is one of the most important components in the Bioinformatics research field. It is of great significance to discover the functional structure and genetic information of nucleic acids and protein. With the rapid development and gradual maturity of high-throughput sequencing technology, the scale of gene data which have been discovered by that is going to increasingly large. Due to the gene sequence alignment calculation has high complexity and the sequencing gene data has large scale, the process of comparison computing will cause a plenty waste of computing time. HISAT2 is the one of most popular sequence comparison software. HISAT2 has better sensitivity and accuracy than other software, at the same time, the speed of process also has highly improved. According to those reasons, this passage implements the HISAT2 parallelization method based on Apache Spark cluster. Through the comparison experiment between single and cluster machine, the parallelization computing speed of HISAT2 parallelization method based on Spark cluster has increased obviously to 3.69 times, with the high rate of accuracy meanwhile.
Subject
General Physics and Astronomy
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献