Abstract
AbstractThe complex and versatile structures of the RNA molecules allow them to exhibit a wide range of functionalities in a cell. Precise secondary structure information offers deeper insights into the functionality of many RNA molecules. Earlier approaches primarily relying on the concept of free-energy minimization proved inadequate, as RNA often adopts complex folds that do not adhere strictly to free energy rules, especially with increasing length. Subsequently, learning-based models were introduced, but they struggled with handling cross-family data and tended to avoid longer sequences to mitigate computational overhead and the risk of overfitting or underfitting due to limited data availability. Here, we propose a residual parallel convolutional neural network-based RNA secondary structure prediction model, called rpcFold, which can effectively accommodate RNAs of varying lengths by integrating the sliding window method. Further, we consider all the 16 possible base pairs, including canonical and noncanonical pairs, and compute the base-pairing possibility score for each nucleotide in the sequence using a locally weighted Gaussian function. This helps to gain intricate base pairing information and better understand the long- and short-range interactions. We pass these input features as an image-like representation to our model rpcFold. The rpcFold outperforms the existing works and shows a significant improvement of 15% in the F1-score over the previous best score on the benchmark cross-family dataset. We observe a 7% and 5% enhancement in F1-score and accuracy respectively over the last best score, for within family dataset. Our precise prediction of RNA structures will help in revealing the roles of functional RNAs and advancing the development of biomedical applications based on RNA.
Publisher
Cold Spring Harbor Laboratory