RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction-Reference-Cited by-同舟云学术

RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction

Published:2021-06-08 Issue: Volume:10 Page:323
ISSN:2046-1402
Container-title:F1000Research
language:en
Short-container-title:F1000Res

Author:

Ramos Thaís A.R.^ORCID,Galindo Nilbson R.O.^ORCID,Arias-Carrasco Raúl^ORCID,da Silva Cecília F.,Maracaja-Coutinho Vinicius^ORCID,do Rêgo Thaís G.

Abstract

Non-coding RNAs (ncRNAs) are important players in the cellular regulation of organisms from different kingdoms. One of the key steps in ncRNAs research is the ability to distinguish coding/non-coding sequences. We applied seven machine learning algorithms (Naive Bayes, Support Vector Machine, K-Nearest Neighbors, Random Forest, Extreme Gradient Boosting, Neural Networks and Deep Learning) through model organisms from different evolutionary branches to create a stand-alone and web server tool (RNAmining) to distinguish coding and non-coding sequences. Firstly, we used coding/non-coding sequences downloaded from Ensembl (April 14th, 2020). Then, coding/non-coding sequences were balanced, had their trinucleotides count analysed (64 features) and we performed a normalization by the sequence length, resulting in total of 180 models. The machine learning algorithms validations were performed using 10-fold cross-validation and we selected the algorithm with the best results (eXtreme Gradient Boosting) to implement at RNAmining. Best F1-scores ranged from 97.56% to 99.57% depending on the organism. Moreover, we produced a benchmarking with other tools already in literature (CPAT, CPC2, RNAcon and TransDecoder) and our results outperformed them. Both stand-alone and web server versions of RNAmining are freely available at https://rnamining.integrativebioinformatics.me/.

Funder

CAPES

ANID-PAI

ACCDiS

ANID-FONDECYT

ANID-FONDAP

Publisher

F1000 Research Ltd

Subject

General Pharmacology, Toxicology and Pharmaceutics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine

Link

https://f1000research.com/articles/10-323/v2/pdf

Reference27 articles.

1. The central role of RNA in the genetic programming of complex organisms.;J Mattick;An Acad Bras Cienc.,2010

2. The Non-Coding Regulatory RNA Revolution in Archaea.;D Gelsinger;Genes (Basel).,2018

3. Causes and consequences of microRNA dysregulation in cancer.;C Croce;Nat Rev Genet.,2009

4. Cerebellar neurodegeneration in the absence of microRNAs.;A Schaefer;J Exp Med.,2007

5. Dysregulation of cardiogenesis, cardiac conduction, and cell cycle in mice lacking miRNA-1-2.;Y Zhao;Cell.,2007

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Transcriptomic Analysis Reveals Adaptive Evolution and Conservation Implications for the Endangered Magnolia lotungensis;Genes;2024-06-14

2. LncPlankton V1.0: a comprehensive collection of plankton long non-coding RNAs;2023-11-05

3. A task-specific encoding algorithm for RNAs and RNA-associated interactions based on convolutional autoencoder;Nucleic Acids Research;2023-10-27

4. Discovery of putative long non-coding RNAs expressed in the eyes of Astyanax mexicanus (Actinopterygii: Characidae);Scientific Reports;2023-07-25

5. RNAincoder: a deep learning-based encoder for RNA and RNA-associated interaction;Nucleic Acids Research;2023-05-11