Abstract
AbstractThe deep learning-powered computational pathology has led to sig-nificant improvements in the speed and precise of tumor diagnosis,, while also exhibiting substantial potential to infer genetic mutations and gene expression levels. However,current studies remain limited in predicting molecular subtypes and recurrence risk in breast cancer. In this paper, we proposed a weakly supervised contrastive learning framework to address this challenge. Our framework first performed contrastive learning pretraining on large-scale unlabeled patches tiled from whole slide images (WSIs) to extract patch-level features. The gated attention mechanism was leveraged to aggregate patch-level features to produce slide feature that was then applied to various downstream tasks. To confirm the effectiveness of the proposed method, we have conducted extensive experiments on four independent cohorts of breast cancer. For gene expression prediction task, rather than one model per gene, we adopted multitask learning to infer the expression levels of 21 recurrence-related genes, and achieved remarkable performance and generalizability that were validated on an external cohort. Particularly, the predictive power to infer molecular subtypes and recurrence events was strongly validated by cross-cohort experiments. In addition, the learned patch-level attention scores enabled us to generate heatmaps that were highly consistent with pathologist annotations and spatial transcriptomic data. These findings demonstrated that our model effectively established the high-order genotype-phenotype associations, thereby enhances the potential of digital pathology in clinical applications.
Publisher
Cold Spring Harbor Laboratory