Multiple sequence-alignment-based RNA language model and its application to structural inference-Reference-Cited by-同舟云学术

Multiple sequence-alignment-based RNA language model and its application to structural inference

Published:2023-03-16 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Zhang Yikun,Lang Mei,Jiang Jiuhong,Gao Zhiqiang,Xu Fan,Litfin Thomas,Chen Ke^ORCID,Singh Jaswinder,Huang Xiansong,Song Guoli,Tian Yonghong,Zhan Jian,Chen Jie,Zhou Yaoqi^ORCID

Abstract

AbstractCompared to proteins, DNA and RNA are more difficult languages to interpret because 4-letter-coded DNA/RNA sequences have less information content than 20-letter-coded protein sequences. While BERT (Bidirectional Encoder Representations from Transformers)-like language models have been developed for RNA, they are ineffective at capturing the evolutionary information from homologous sequences because unlike proteins, RNA sequences are less conserved. Here, we have developed an unsupervised Multiple sequence-alignment-based RNA language model (RNA-MSM) by utilizing homologous sequences from an automatic pipeline, RNAcmap. The resulting unsupervised, two-dimensional attention maps and one-dimensional embeddings from RNA-MSM can be directly mapped with high accuracy to 2D base pairing probabilities and 1D solvent accessibilities, respectively. Further fine-tuning led to significantly improved performance on these two downstream tasks over existing state-of-the-art techniques. We anticipate that the pre-trained RNA-MSM model can be fine-tuned on many other tasks related to RNA structure and function.

Publisher

Cold Spring Harbor Laboratory

Reference69 articles.

1. Radford, A. , Narasimhan, K. , Salimans, T. & Sutskever, I. Improving language understanding by generative pre-training. (2018).

2. The language of proteins: NLP, machine learning & protein sequences

3. Deep learning methods for 3D structural proteome and interactome modeling;Curr. Opin. Struct. Biol,2022

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. All-Atom Biomolecular Simulation in the Exascale Era;Journal of Chemical Theory and Computation;2024-02-21

2. Predicting RNA structures and functions by artificial intelligence;Trends in Genetics;2024-01

3. ATOM-1: A Foundation Model for RNA Structure and Function Built on Chemical Mapping Data;2023-12-14

4. UNI-RNA: UNIVERSAL PRE-TRAINED MODELS REVOLUTIONIZE RNA RESEARCH;2023-07-12