Abstract
AbstractModern multi-omic technologies can generate deep multi-scale profiles. However, differences in data modalities, multicollinearity of the data, and large numbers of irrelevant features make the analyses and integration of high-dimensional omic datasets challenging. Here, we present Significant Latent factor Interaction Discovery and Exploration (SLIDE), a first-in-class interpretable machine learning technique for identifying significant interacting latent factors underlying outcomes of interest from high-dimensional omic datasets. SLIDE makes no assumptions regarding data-generating mechanisms, comes with theoretical guarantees regarding identifiability of the latent factors/corresponding inference, outperforms/performs at least as well as state-of-the-art approaches in terms of prediction, and provides inference beyond prediction. Using SLIDE on scRNA-seq data from systemic sclerosis (SSc) patients, we first uncovered significant interacting latent factors underlying SSc pathogenesis. In addition to accurately predicting SSc severity and outperforming existing benchmarks, SLIDE uncovered significant factors that included well-elucidated altered transcriptomic states in myeloid cells and fibroblasts, an intriguing keratinocyte-centric signature validated by protein staining, and a novel mechanism involving altered HLA signaling in myeloid cells, that has support in genetic data. SLIDE also worked well on spatial transcriptomic data and was able to accurately identify significant interacting latent factors underlying immune cell partitioning by 3D location within lymph nodes. Finally, SLIDE leveraged paired scRNA-seq and TCR-seq data to elucidate latent factors underlying extents of clonal expansion of CD4 T cells in a nonobese diabetic model of T1D. The latent factors uncovered by SLIDE included well-known activation markers, inhibitory receptors and intracellular regulators of receptor signaling, but also honed in on several novel naïve and memory states that standard analyses missed. Overall, SLIDE is a versatile engine for biological discovery from modern multi-omic datasets.
Publisher
Cold Spring Harbor Laboratory