Author:
Markey Miles,Kim Juhyun,Goldstein Zvi,Gerardin Ylaine,Brosnan-Cashman Jacqueline,Javed Syed Ashar,Juyal Dinkar,Pagidela Harshith,Yu Limin,Rahsepar Bahar,Abel John,Hennek Stephanie,Khosla Archit,Taylor-Weiner Amaro,Parmar Chintan
Abstract
AbstractThe relative abundance of cancer-associated fibroblast (CAF) subtypes influences a tumor’s response to treatment, especially immunotherapy. However, the extent to which the underlying tumor composition associates with CAF subtype-specific gene expression is unclear. Here, we describe an interpretable machine learning (ML) approach, additive multiple instance learning (aMIL), to predict bulk gene expression signatures from H&E-stained whole slide images (WSI), focusing on an immunosuppressive LRRC15+ CAF-enriched TGFβ-CAF signature. aMIL models accurately predicted TGFβ-CAF across various cancer types. Tissue regions contributing most highly to slide-level predictions of TGFβ-CAF were evaluated by ML models characterizing spatial distributions of diverse cell and tissue types, stromal subtypes, and nuclear morphology. In breast cancer, regions contributing most to TGFβ-CAF-high predictions (“excitatory”) were localized to cancer stroma with high fibroblast density and mature collagen fibers. Regions contributing most to TGFβ-CAF-low predictions (“inhibitory”) were localized to cancer epithelium and densely inflamed stroma. Fibroblast and lymphocyte nuclear morphology also differed between excitatory and inhibitory regions. Thus, aMIL enables a data-driven link between tissue phenotype and transcription, offering biological interpretability beyond typical black-box models.
Publisher
Cold Spring Harbor Laboratory