Abstract
AbstractmicroRNAs (miRNAs) are known as one of the small non-coding RNA molecules, which control the expressions of genes at the RNA level. They typically range 20-24 nucleotides in length and can be found in the plant and animal kingdoms and in some viruses. Computational approaches have overcome the limitations in the experimental methods and have performed well in identifying miRNAs. Compared to mature miRNAs, precursor miRNAs (pre-miRNAs) are long and have a hairpin loop structure with structural features. Therefore, most in-silico tools are implemented for the pre-miRNAs identification. This study presents a multilayer perceptron (MLP) based classifier implemented using 180 features under sequential, structural, and thermodynamic feature categories for plant pre-miRNA identification. This classifier has a 92% accuracy, 94% specificity, and 90% sensitivity. We have further tested this model with other small non-coding RNA types and obtained 78% accuracy. Furthermore, we introduce a novel dataset to train and test machine learning models, addressing the overlapping data issue in positive training and testing datasets presented in PlantMiRNAPred, a study done by Xuan et al.for the classification of real and pseudo plant pre-miRNAs. The new dataset and the classifier are deployed on a web server which is freely accessible via http://mirnafinder.shyaman.me/.
Publisher
Cold Spring Harbor Laboratory