Author:
Padegal Girivinay,Rao Murali Krishna,Boggaram Ravishankar Om Amitesh,Acharya Sathwik,Athri Prashanth,Srinivasa Gowri
Abstract
Abstract
Background
RNA sequencing (RNA-Seq) is a technique that utilises the capabilities of next-generation sequencing to study a cellular transcriptome i.e., to determine the amount of RNA at a given time for a given biological sample. The advancement of RNA-Seq technology has resulted in a large volume of gene expression data for analysis.
Results
Our computational model (built on top of TabNet) is first pretrained on an unlabelled dataset of multiple types of adenomas and adenocarcinomas and later fine-tuned on the labelled dataset, showing promising results in the context of the estimation of the vital status of colorectal cancer patients. We achieve a final cross-validated (ROC-AUC) Score of 0.88 by using multiple modalities of data.
Conclusion
The results of this study demonstrate that self-supervised learning methods pretrained on a vast corpus of unlabelled data outperform traditional supervised learning methods such as XGBoost, Neural Networks, and Decision Trees that have been prevalent in the tabular domain. The results of this study are further boosted by the inclusion of multiple modalities of data pertaining to the patients in question. We find that genes such as RBM3, GSPT1, MAD2L1, and others important to the computation model’s prediction task obtained through model interpretability corroborate with pathological evidence in current literature.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference42 articles.
1. de Martel C, Georges D, Bray F, Ferlay J, Clifford GM. Global burden of cancer attributable to infections in 2018: a worldwide incidence analysis. Lancet Global Health. 2020;8(2):180–90.
2. Ferlay J, Ervik M, Lam F, Colombet M, Mery L, Piñeros M, Znaor A, Soerjomataram I, Bray F. Global cancer observatory: cancer today. Lyon France Int Agency Res Cancer. 2018;3(20):2019.
3. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.
4. Siegel RL, Miller KD, Goding Sauer A, Fedewa SA, Butterly LF, Anderson JC, Cercek A, Smith RA, Jemal A. Colorectal cancer statistics, 2020. CA Cancer J Clin. 2020;70(3):145–64.
5. Kirk S, Lee Y, Sadow C, Levine S, Roche C, Bonaccio E, Filiippini J. Radiology data from the cancer genome atlas colon adenocarcinoma [tcga-coad] collection. Cancer Imaging Arch. 2016;10:9.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Potential of GSPT1 as a novel target for glioblastoma therapy;Cell Death & Disease;2024-08-08
2. Predicting Pedestrian Involvement in Fatal Crashes Using a TabNet Deep Learning Model;Proceedings of the 16th ACM SIGSPATIAL International Workshop on Computational Transportation Science;2023-11-13