HISAT2 Parallelization Method Based on Spark Cluster-Reference-Cited by-同舟云学术

HISAT2 Parallelization Method Based on Spark Cluster

Published:2022-01-01 Issue:1 Volume:2179 Page:012038
ISSN:1742-6588
Container-title:Journal of Physics: Conference Series
language:
Short-container-title:J. Phys.: Conf. Ser.

Author:

Guo Jiannan,Gao Jing,Liu Zhenyu

Abstract

Abstract Sequence alignment is one of the most important components in the Bioinformatics research field. It is of great significance to discover the functional structure and genetic information of nucleic acids and protein. With the rapid development and gradual maturity of high-throughput sequencing technology, the scale of gene data which have been discovered by that is going to increasingly large. Due to the gene sequence alignment calculation has high complexity and the sequencing gene data has large scale, the process of comparison computing will cause a plenty waste of computing time. HISAT2 is the one of most popular sequence comparison software. HISAT2 has better sensitivity and accuracy than other software, at the same time, the speed of process also has highly improved. According to those reasons, this passage implements the HISAT2 parallelization method based on Apache Spark cluster. Through the comparison experiment between single and cluster machine, the parallelization computing speed of HISAT2 parallelization method based on Spark cluster has increased obviously to 3.69 times, with the high rate of accuracy meanwhile.

Publisher

IOP Publishing

Subject

General Physics and Astronomy

Link

https://iopscience.iop.org/article/10.1088/1742-6596/2179/1/012038/pdf

Reference31 articles.

1. High-throughput Sequencing Technology and Its Application[J];Wang;China Biotechnology,2012

2. HISAT: A fast spliced aligner with low memory requirements[J];Kim;Nature Methods,2015

3. NGSANE: A Lightweight Production Informatics Framework for High Throughput Data Analysis[J];Buske;Bioinformatics,2014

4. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions[J];Kim;Genome Biology,2013

5. STAR: ultrafast universal RNA-seq aligner[J];Valencia;Bioinformatics,2014

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AITeQ: a machine learning framework for Alzheimer’s prediction using a distinctive five-gene signature;Briefings in Bioinformatics;2024-05-23

2. Identification of Crucial Modules and Genes Associated with Bt Gene Expression in Cotton;Genes;2024-04-19

3. Haplotype-resolved genome assembly provides insights into evolutionary history of the Actinidia arguta tetraploid;Molecular Horticulture;2024-02-06

4. Integration of ATAC-Seq and RNA-Seq Analysis to Identify Key Genes in the Longissimus Dorsi Muscle Development of the Tianzhu White Yak;International Journal of Molecular Sciences;2023-12-21

5. New insights into the antimicrobial mechanism of LEAP2 mutant zebrafish under Aeromonas hydrophila infection using transcriptome analysis;Fish & Shellfish Immunology;2023-12