Sequence similarity governs generalizability of de novo deep learning models for RNA secondary structure prediction-Reference-Cited by-同舟云学术

Sequence similarity governs generalizability of de novo deep learning models for RNA secondary structure prediction

Published:2023-04-17 Issue:4 Volume:19 Page:e1011047
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Qiu Xiangyun

Abstract

Making no use of physical laws or co-evolutionary information, de novo deep learning (DL) models for RNA secondary structure prediction have achieved far superior performances than traditional algorithms. However, their statistical underpinning raises the crucial question of generalizability. We present a quantitative study of the performance and generalizability of a series of de novo DL models, with a minimal two-module architecture and no post-processing, under varied similarities between seen and unseen sequences. Our models demonstrate excellent expressive capacities and outperform existing methods on common benchmark datasets. However, model generalizability, i.e., the performance gap between the seen and unseen sets, degrades rapidly as the sequence similarity decreases. The same trends are observed from several recent DL and machine learning models. And an inverse correlation between performance and generalizability is revealed collectively across all learning-based models with wide-ranging architectures and sizes. We further quantitate how generalizability depends on sequence and structure identity scores via pairwise alignment, providing unique quantitative insights into the limitations of statistical learning. Generalizability thus poses a major hurdle for deploying de novo DL models in practice and various pathways for future advances are discussed.

Publisher

Public Library of Science (PLoS)

Subject

Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics

Reference67 articles.

1. RNA secondary structure: physical and computational aspects;PG Higgs;Q Rev Biophys,2000

2. Recent advances in RNA folding;J Fallmann;J Biotechnol,2017

3. RNA folding: conformational statistics, folding kinetics, and ion electrostatics.;SJ Chen;Annu Rev Biophys.,2008

4. The noncoding RNA revolution-trashing old rules to forge new ones;TR Cech;Cell,2014

5. The four dimensions of noncoding RNA conservation;S. Diederichs;Trends Genet,2014

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction;Journal of Molecular Biology;2024-09

2. Deep dive into RNA: a systematic literature review on RNA structure prediction using machine learning methods;Artificial Intelligence Review;2024-08-15

3. Accurate prediction of RNA secondary structure including pseudoknots through solving minimum-cost flow with learned potentials;Communications Biology;2024-03-09

4. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction;2024-02-02

5. Deep learning models of RNA base-pairing structures generalize to unseen folds and make accurate zero-shot predictions of base-base interactions of RNA complexes;2023-09-28