Self-supervised learning on millions of pre-mRNA sequences improves sequence-based RNA splicing prediction-Reference-Cited by-同舟云学术

Self-supervised learning on millions of pre-mRNA sequences improves sequence-based RNA splicing prediction

Published:2023-02-03 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Chen Ken^ORCID,Zhou Yue,Ding Maolin,Wang Yu^ORCID,Ren Zhixiang,Yang Yuedong^ORCID

Abstract

ABSTRACTRNA splicing is an important post-transcriptional process of gene expression in eukaryotic cells. Predicting RNA splicing from primary sequences can facilitate the interpretation of genomic variants. In this study, we developed a novel self-supervised pre-trained language model, SpliceBERT, to improve sequence-based RNA splicing prediction. Pre-training on pre-mRNA sequences from vertebrates enables SpliceBERT to capture evolutionary conservation information and characterize the unique property of splice sites. SpliceBERT also improves zero-shot prediction of variant effects on splicing by considering sequence context information, and achieves superior performance for predicting branchpoint in the human genome and splice sites across species. Our study highlighted the importance of pre-training genomic language models on a diverse range of species and suggested that pre-trained language models were promising for deciphering the sequence logic of RNA splicing.

Publisher

Cold Spring Harbor Laboratory

Reference65 articles.

1. Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals

2. Deciphering the splicing code

3. The human splicing code reveals new insights into the genetic determinants of disease

4. Assessing predictions of the impact of variants on splicing in CAGI5

5. ESEfinder: a web resource to identify exonic splicing enhancers

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. BEACON: Benchmark for Comprehensive RNA Tasks and Language Models;2024-06-28

2. Multimodal Large Language Models in Healthcare: Applications, Challenges, and Future Outlook (Preprint);Journal of Medical Internet Research;2024-04-13

3. Perturbation-aware predictive modeling of RNA splicing using bidirectional transformers;2024-03-21

4. Evaluating the representational power of pre-trained DNA language models for regulatory genomics;2024-03-04

5. UNI-RNA: UNIVERSAL PRE-TRAINED MODELS REVOLUTIONIZE RNA RESEARCH;2023-07-12