Discovering misannotated lncRNAs using deep learning training dynamics-Reference-Cited by-同舟云学术

Discovering misannotated lncRNAs using deep learning training dynamics

Published:2022-12-26 Issue:1 Volume:39 Page:
ISSN:1367-4811
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Nabi Afshan¹,Dilekoglu Berke¹,Adebali Ogun¹,Tastan Oznur¹^ORCID

Affiliation:

1. Faculty of Engineering and Natural Sciences, Sabanci University , Istanbul 34956, Turkey

Abstract

Abstract Motivation Recent experimental evidence has shown that some long non-coding RNAs (lncRNAs) contain small open reading frames (sORFs) that are translated into functional micropeptides, suggesting that these lncRNAs are misannotated as non-coding. Current methods to detect misannotated lncRNAs rely on ribosome-profiling (Ribo-Seq) and mass-spectrometry experiments, which are cell-type dependent and expensive. Results Here, we propose a computational method to identify possible misannotated lncRNAs from sequence information alone. Our approach first builds deep learning models to discriminate coding and non-coding transcripts and leverages these models’ training dynamics to identify misannotated lncRNAs—i.e. lncRNAs with coding potential. The set of misannotated lncRNAs we identified significantly overlap with experimentally validated ones and closely resemble coding protein sequences as evidenced by significant BLAST hits. Our analysis on a subset of misannotated lncRNA candidates also shows that some ORFs they contain yield high confidence folded structures as predicted by AlphaFold2. This methodology offers promising potential for assisting experimental efforts in characterizing the hidden proteome encoded by misannotated lncRNAs and for curating better datasets for building coding potential predictors. Availability and implementation Source code is available at https://github.com/nabiafshan/DetectingMisannotatedLncRNAs. Supplementary information Supplementary data are available at Bioinformatics online.

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btac821/48416491/btac821.pdf

Reference51 articles.

1. Basic local alignment search tool;Altschul;J. Mol. Biol,1990

2. A micropeptide encoded by a putative long noncoding RNA regulates muscle performance;Anderson;Cell,2015

3. When non-coding is not enough;Anfossi;J. Exp. Med,2020

4. Extensive translation of small open reading frames revealed by Poly-Ribo-Seq;Aspden;Elife,2014

5. LncRNAnet: long non-coding RNA identification using deep learning;Baek;Bioinformatics,2018

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The potential regulatory role of the non-coding RNAs in regulating the exogenous estrogen-induced feminization in Takifugu rubripes gonad;Aquatic Toxicology;2024-08

2. Transformer technology in molecular science;WIREs Computational Molecular Science;2024-07

3. Micropeptides: potential treatment strategies for cancer;Cancer Cell International;2024-04-15

4. A comprehensive survey on deep learning-based identification and predicting the interaction mechanism of long non-coding RNAs;Briefings in Functional Genomics;2024-04-04

5. Discovering microproteins: making the most of ribosome profiling data;RNA Biology;2023-11-27