BulkRNABert: Cancer prognosis from bulk RNA-seq based language models-Reference-Cited by-同舟云学术

BulkRNABert: Cancer prognosis from bulk RNA-seq based language models

Published:2024-06-22 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Gélard Maxence^ORCID,Richard Guillaume,Pierrot Thomas^ORCID,Cournède Paul-Henry^ORCID

Abstract

AbstractRNA sequencing (RNA-seq) has become a key technology in precision medicine, especially for cancer prognosis. However, the high dimensionality of such data may restrict classic statistical methods, thus raising the need to learn dense representations from them. Transformers models have exhibited capacities in providing representations for long sequences and thus are well suited for transcriptomics data. In this paper, we develop a pre-trained transformer-based language model through self-supervised learning using bulk RNA-seq from both non-cancer and cancer tissues, following BERT’s masking method. By probing learned embeddings from the model or using parameter-efficient fine-tuning, we then build downstream models for cancer type classification and survival time prediction. Leveraging the TCGA dataset, we demonstrate the performance of our method,BulkRNABert, on both tasks, with significant improvement compared to state-of-the-art methods in the pan-cancer setting for classification and survival analysis. We also show the transfer-learning capabilities of the model in the survival analysis setting on unseen cohorts. Code available athttps://github.com/instadeepai/multiomics-open-research

Publisher

Cold Spring Harbor Laboratory

Reference35 articles.

1. Layer normalization;arXiv preprint,2016

2. Customics: A versatile deep-learning based strategy for multi-omics integration;PLOS Computational Biology,2023

3. The genotype-tissue expression (gtex) project;Biopreservation and biobanking,2015

4. Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data;PLoS computational biology,2018

5. Support-vector networks;Machine learning,1995