A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more-Reference-Cited by-同舟云学术

A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more

Published:2011-12-22 Issue:2 Volume:18 Page:193-212
ISSN:1355-8382
Container-title:RNA
language:en
Short-container-title:RNA

Author:

Rivas Elena,Lang Raymond,Eddy Sean R.

Abstract

The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.

Publisher

Cold Spring Harbor Laboratory

Subject

Molecular Biology

Reference81 articles.

1. Efficient parameter estimation for RNA secondary structure prediction

2. Computational approaches for RNA energy parameter estimation

3. Backofen R , Tsur D , Zakov S , Ziv-Ukelson M . 2009. Sparse RNA folding: time and space efficient algorithms. In Proceedings of the 20th Symposium on Combinatorial Pattern Matching, pp. 249â262. Springer-Verlag, Berlin, Heidelberg.

4. The Complete Atomic Structure of the Large Ribosomal Subunit at 2.4 Å Resolution

5. RNAcentral: A vision for an international database of RNA sequences

Cited by 98 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction;Journal of Molecular Biology;2024-09

2. NNDB: An Expanded Database of Nearest Neighbor Parameters for Predicting Stability of Nucleic Acid Secondary Structures;Journal of Molecular Biology;2024-09

3. Limits of experimental evidence in RNA secondary structure prediction;Frontiers in Bioinformatics;2024-02-22

4. Recent applications of artificial intelligence in RNA-targeted small molecule drug discovery;Expert Opinion on Drug Discovery;2024-02-06

5. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction;2024-02-02