Affiliation:
1. Graphic Era Deemed to be University
2. Indraprastha APOLLO Hospitals
3. NIT Srinagar
4. Adventist Health St. Helena
5. Advanced Cardiac & Vascular Institute
6. Massachusetts General Hospital
7. Queen's University
8. VINČA Institute of Nuclear Sciences - National Institute of the Republic of Serbia, University of Belgrade
9. Idaho State University
10. University of Cagliari
11. Mayo Clinic
12. AtheroPoint
Abstract
Abstract
Background and Motivation: Due to the intricate relationship between the small non-coding ribonucleic acid (miRNA) sequences, the classification of miRNA species, namely Human, Gorilla, Rat, and Mouse is challenging. Previous methods are not robust and accurate. In this study, we present GeneAI 3.0 (AtheroPoint™, Roseville, CA, USA), a powerful, novel, and generalized method for extracting features from the fixed patterns of purines and pyrimidines in each miRNA sequence in ensemble paradigms in machine learning (EML) and convolutional neural network (CNN)-based deep learning (EDL) frameworks.
Method: GeneAI 3.0 utilized five conventional (Entropy, Dissimilarity, Energy, Homogeneity, and Contrast), and three contemporary (Shannon entropy, Hurst exponent, Fractal dimension) features, to generate a compositefeature set from given miRNA sequences which were then passed into our ML and DL classification framework. A set of 11 new classifiers was designed consisting of five EML and six EDL for binary/multiclass classification. It was benchmarked against 9 solo ML (SML), 6 solo DL (SDL), 12 hybrid DL (HDL) models, resulting in a total of 11+27=38 models were designed. Four hypotheses were formulated and validated using explainable AI (XAI) as well as reliability/statistical tests.
Results: The order of the mean performance using accuracy (ACC)/area-under-the-curve (AUC) of the 24 DL classifiers was: EDL>HDL>SDL. The mean performance of EDL models with CNN layers was superior to that without CNN layers by 0.73%/0.92%. Mean performance of EML models was superior to SML models with improvements of ACC/AUC by 6.24%/6.46%. EDL models performed significantly better than EML models, with a mean increase in ACC/AUC of 7.09%/6.96%. The GeneAI 3.0 tool produced expected XAI feature plots, and the statistical tests showed significant p-values.
Conclusions: Ensemble models with composite features are highly effective and generalized models for effectively classifying miRNA sequences.
Publisher
Research Square Platform LLC
Reference150 articles.
1. D. Anglicheau et al., "MicroRNAs: small RNAs with big effects," Transplantation, vol. 90, no. 2, p. 105, 2010.
2. "The microRNA world: small is mighty,";Nelson P;Trends in biochemical sciences,2003
3. "Micro RNA-125b (miRNA-125b) function in astrogliosis and glial cell proliferation;Pogue A;Neuroscience letters,2010
4. "Antisense inhibition of human miRNAs and indications for an involvement of miRNA in cell growth and apoptosis,";Cheng AM;Nucleic acids research,2005
5. A. La Torre et al., "Conserved microRNA pathway regulates developmental timing of retinal neurogenesis," Proceedings of the National Academy of Sciences, vol. 110, no. 26, pp. E2362-E2370, 2013.