A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs-Reference-Cited by-同舟云学术

A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs

Published:2022-11-24 Issue:21 Volume:50 Page:12094-12111
ISSN:0305-1048
Container-title:Nucleic Acids Research
language:en
Short-container-title:

Author:

Singh Dalwinder¹^ORCID,Roy Joy¹

Affiliation:

1. National Agri-Food Biotechnology Institute , SAS Nagar, Punjab, 140306, India

Abstract

Abstract Identification of protein-coding and non-coding transcripts is paramount for understanding their biological roles. Computational approaches have been addressing this task for over a decade; however, generalized and high-performance models are still unreliable. This benchmark study assessed the performance of 24 tools producing >55 models on the datasets covering a wide range of species. We have collected 135 small and large transcriptomic datasets from existing studies for comparison and identified the potential bottlenecks hampering the performance of current tools. The key insights of this study include lack of standardized training sets, reliance on homogeneous training data, gradual changes in annotated data, lack of augmentation with homology searches, the presence of false positives and negatives in datasets and the lower performance of end-to-end deep learning models. We also derived a new dataset, RNAChallenge, from the benchmark considering hard instances that may include potential false alarms. The best and least well performing models under- and overfit the dataset, respectively, thereby serving a dual purpose. For computational approaches, it will be valuable to develop accurate and unbiased models. The identification of false alarms will be of interest for genome annotators, and experimental study of hard RNAs will help to untangle the complexity of the RNA world.

Funder

National Agri-Food Biotechnology Institute

Publisher

Oxford University Press (OUP)

Subject

Genetics

Link

https://academic.oup.com/nar/article-pdf/50/21/12094/47999368/gkac1092.pdf

Reference83 articles.

1. Long non-coding RNAs: insights into functions;Mercer;Nat. Rev. Genet.,2009

2. Molecular mechanisms of long noncoding RNAs;Wang;Mol. Cell,2011

3. Roles, functions, and mechanisms of long non-coding RNAs in cancer;Fang;Genomics Proteomics Bioinformatics,2016