Automatic recognition of complementary strands: lessons regarding machine learning abilities in RNA folding-Reference-Cited by-同舟云学术

Automatic recognition of complementary strands: lessons regarding machine learning abilities in RNA folding

Published:2023-09-04 Issue: Volume:14 Page:
ISSN:1664-8021
Container-title:Frontiers in Genetics
language:
Short-container-title:Front. Genet.

Author:

Chasles Simon,Major François

Abstract

Introduction: Prediction of RNA secondary structure from single sequences still needs substantial improvements. The application of machine learning (ML) to this problem has become increasingly popular. However, ML algorithms are prone to overfitting, limiting the ability to learn more about the inherent mechanisms governing RNA folding. It is natural to use high-capacity models when solving such a difficult task, but poor generalization is expected when too few examples are available.Methods: Here, we report the relation between capacity and performance on a fundamental related problem: determining whether two sequences are fully complementary. Our analysis focused on the impact of model architecture and capacity as well as dataset size and nature on classification accuracy.Results: We observed that low-capacity models are better suited for learning with mislabelled training examples, while large capacities improve the ability to generalize to structurally dissimilar data. It turns out that neural networks struggle to grasp the fundamental concept of base complementarity, especially in lengthwise extrapolation context.Discussion: Given a more complex task like RNA folding, it comes as no surprise that the scarcity of useable examples hurdles the applicability of machine learning techniques to this field.

Publisher

Frontiers Media SA

Subject

Genetics (clinical),Genetics,Molecular Medicine

Reference35 articles.

1. Learning in high dimension always amounts to extrapolation BalestrieroP. LeCunBalestrieroR. PesentiJ. LeCunY. 2021

2. Reconciling modern machine-learning practice and the classical bias–variance trade-off;Belkin;Proc. Natl. Acad. Sci.,2019

3. Training neural networks for and by interpolation BerradaZ. KumarBerradaL. ZissermanA. KumarM. P. 2020

4. Rcsb protein data bank: celebrating 50 years of the pdb with new tools for understanding and visualizing biological macromolecules in 3d;Burley;Protein Sci.,2022

5. Rna secondary structure prediction by learning unrolled algorithms ChenL. UmarovG. SongChenX. LiY. UmarovR. GaoX. 2020