FastFeatGen: Faster parallel feature extraction from genome sequences and efficient prediction of DNA N6-methyladenine sites-Reference-Cited by-同舟云学术

FastFeatGen: Faster parallel feature extraction from genome sequences and efficient prediction of DNA N6-methyladenine sites

Published:2019-11-18 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Rahman Md. Khaledur^ORCID

Abstract

AbstractN6-methyladenine is widely found in both prokaryotes and eukaryotes. It is responsible for many biological processes including prokaryotic defense system and human diseases. So, it is important to know its correct location in genome which may play a significant role in different biological functions. Few computational tools exist to serve this purpose but they are computationally expensive and still there is scope to improve accuracy. An informative feature extraction pipeline from genome sequences is the heart of these tools as well as for many other bioinformatics tools. But it becomes reasonably expensive for sequential approaches when the size of data is large. Hence, a scalable parallel approach is highly desirable. In this paper, we have developed a new tool, called FastFeatGen, emphasizing both developing a parallel feature extraction technique and improving accuracy using machine learning methods. We have implemented our feature extraction approach using shared memory parallelism which achieves around 10× speed over the sequential one. Then we have employed an exploratory feature selection technique which helps to find more relevant features that can be fed to machine learning methods. We have employed Extra-Tree Classifier (ETC) in FastFeatGen and performed experiments on rice and mouse genomes. Our experimental results achieve accuracy of 85.57% and 96.64%, respectively, which are better or competitive to current state-of-the-art methods. Our shared memory based tool can also serve queries much faster than sequential technique. All source codes and datasets are available at https://github.com/khaled-rahman/FastFeatGen.

Publisher

Cold Spring Harbor Laboratory

Reference33 articles.

1. A survey of multicore processors;IEEE Signal Processing Magazine,2009

2. propy: a tool to generate various modes of Chou’s PseAAC

3. On over-fitting in model selection and subsequent selection bias in performance evaluation;Journal of Machine Learning Research,2010

4. W. Chen , H. Lv , F. Nie , and H. Lin . i6ma-pred: Identifying dna n6-methyladenine sites in the rice genome. Bioinformatics, 2019.

5. Some remarks on protein attribute prediction and pseudo amino acid composition

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Harnessing Current Knowledge of DNA N6-Methyladenosine From Model Plants for Non-model Crops;Frontiers in Genetics;2021-04-29