ERNIE-RNA: An RNA Language Model with Structure-enhanced Representations

Author:

Yin WeijieORCID,Zhang Zhaoyu,He LiangORCID,Jiang RuiORCID,Zhang Shuo,Liu GanORCID,Zhang XuegongORCID,Qin TaoORCID,Xie ZhenORCID

Abstract

AbstractWith large amounts of unlabeled RNA sequences data produced by high-throughput sequencing technologies, pre-trained RNA language models have been developed to estimate semantic space of RNA molecules, which facilities the understanding of grammar of RNA language. However, existing RNA language models overlook the impact of structure when modeling the RNA semantic space, resulting in incomplete feature extraction and suboptimal performance across various downstream tasks. In this study, we developed a RNA pre-trained language model named ERNIE-RNA (EnhancedRepresentations with base-pairing restriction forRNAmodeling) based on a modified BERT (Bidirectional Encoder Representations from Transformers) by incorporating base-pairing restriction with no MSA (Multiple Sequence Alignment) information. We found that the attention maps from ERNIE-RNA with no fine-tuning are able to capture RNA structure in the zero-shot experiment more precisely than conventional methods such as fine-tuned RNAfold and RNAstructure, suggesting that the ERNIE-RNA can provide comprehensive RNA structural representations. Furthermore, ERNIE-RNA achieved SOTA (state-of-the-art) performance after fine-tuning for various downstream tasks, including RNA structural and functional predictions. In summary, our ERNIE-RNA model provides general features which can be widely and effectively applied in various subsequent research tasks. Our results indicate that introducing key knowledge-based prior information in the BERT framework may be a useful strategy to enhance the performance of other language models.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3