Affiliation:
1. College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, P. R. China
Abstract
Sequence Assembly is one of the important topics in bioinformatics research. Sequence assembly algorithm has always met the problems of poor assembling precision and low efficiency. In view of these two problems, this paper designs and implements a precise assembling algorithm under the strategy of finding the source of reads based on the MapReduce (SA-BR-MR) and Eulerian path algorithm. Computational results show that SA-BR-MR is more accurate than other algorithms. At the same time, SA-BR-MR calculates 54 sequences which are randomly selected from animals, plants and microorganisms with base lengths from hundreds to tens of thousands from NCBI. All matching rates of the 54 sequences are 100%. For each species, the algorithm summarizes the range of [Formula: see text] which makes the matching rates to be 100%. In order to verify the range of [Formula: see text] value of hepatitis C virus (HCV) and related variants, the randomly selected eight HCV variants are calculated. The results verify the correctness of [Formula: see text] range of hepatitis C and related variants from NCBI. The experiment results provide the basis for sequencing of other variants of the HCV. In addition, Spark platform is a new computing platform based on memory computation, which is featured by high efficiency and suitable for iterative calculation. Therefore, this paper designs and implements sequence assembling algorithm based on the Spark platform under the strategy of finding the source of reads (SA-BR-Spark). In comparison with SA-BR-MR, SA-BR-Spark shows a superior computational speed.
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献