Heuristic Analysis of Genomic Sequence Processing Models for High Efficiency Prediction: A Statistical Perspective-Reference-Cited by-同舟云学术

Heuristic Analysis of Genomic Sequence Processing Models for High Efficiency Prediction: A Statistical Perspective

Published:2022-08 Issue:5 Volume:23 Page:299-317
ISSN:1389-2029
Container-title:Current Genomics
language:en
Short-container-title:CG

Author:

Shrimankar Deepti D.¹^ORCID,Durge Aditi R.¹^ORCID,Sawarkar Ankush D.¹^ORCID

Affiliation:

1. Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India

Abstract

Abstract: Genome sequences indicate a wide variety of characteristics, which include species and sub-species type, genotype, diseases, growth indicators, yield quality, etc. To analyze and study the characteristics of the genome sequences across different species, various deep learning models have been proposed by researchers, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Multilayer Perceptrons (MLPs), etc., which vary in terms of evaluation performance, area of application and species that are processed. Due to a wide differentiation between the algorithmic implementations, it becomes difficult for research programmers to select the best possible genome processing model for their application. In order to facilitate this selection, the paper reviews a wide variety of such models and compares their performance in terms of accuracy, area of application, computational complexity, processing delay, precision and recall. Thus, in the present review, various deep learning and machine learning models have been presented that possess different accuracies for different applications. For multiple genomic data, Repeated Incremental Pruning to Produce Error Reduction with Support Vector Machine (Ripper SVM) outputs 99.7% of accuracy, and for cancer genomic data, it exhibits 99.27% of accuracy using the CNN Bayesian method. Whereas for Covid genome analysis, Bidirectional Long Short-Term Memory with CNN (BiLSTM CNN) exhibits the highest accuracy of 99.95%. A similar analysis of precision and recall of different models has been reviewed. Finally, this paper concludes with some interesting observations related to the genomic processing models and recommends applications for their efficient use.

Publisher

Bentham Science Publishers Ltd.

Subject

Genetics (clinical),Genetics

Reference76 articles.

1. Barbeira A.N.; Melia O.J.; Liang Y.; Bonazzola R.; Wang G.; Wheeler H.E.; Aguet F.; Ardlie K.G.; Wen X.; Im, H.K. Fine‐mapping and QTL tissue‐sharing information improves the reliability of causal gene identification. Genet Epidemiol 2020,44(8),854-867

2. Seo H.; Song Y.J.; Cho K.; Cho D.H.; Specificity analysis of genome based on statistically identical K-words with same base combination. IEEE Open J Eng Med Biol 2020,1,214-219

3. Libbrecht M.W.; Noble W.S.; Machine learning applications in genetics and genomics. Nat Rev Genet 2015,16(6),321-332

4. Schrider D.R.; Kern A.D.; Supervised machine learning for population genetics: A new paradigm. Trends Genet 2018,34(4),301-312

5. Abbas Z.; Tayara H.; Chong K.; Spinenet-6MA: A novel deep learning tool for predicting DNA N6-methyladenine sites in genomes. IEEE Access 2020,8,201450-201457

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DHFS-ECM: Design of a Dual Heuristic Feature Selection-based Ensemble Classification Model for the Identification of Bamboo Species from Genomic Sequences;Current Genomics;2024-06

2. A Novel Fuzzy Bi-Clustering Algorithm with Axiomatic Fuzzy Set for Identification of Co-Regulated Genes;Mathematics;2024-05-26

3. A Study and Analysis of Disease Identification using Genomic Sequence Processing Models: An Empirical Review;Current Genomics;2023-07