Abstract
ABSTRACTIn the last decade, immunotherapies targeting immune checkpoint inhibitors have been extremely effective in eliminating subsets of some cancers in some patients. Multi-modal immune and non-immune factors that contribute to clinical outcomes have been utilized for predicting response to therapies and developing diagnostics. However, these data analytic methods involve a combination of complex mathematical data analytics, and even-more complex biological mechanistic pathways. In order to develop a method for data analytics of transcriptomics data sets, we have utilized an explainable machine learning (ML) model to investigate the genes involved in the signaling pathway of T-cell-immunoreceptor with immunoglobulin and ITIM domain (TIGIT). TIGIT is a receptor on T, NK, and T-regulatory cells, that has been classified as a checkpoint inhibitor due to its ability to inhibit innate and adaptive immune responses. We extracted gene whole genome sequencing data of 1029 early breast cancer patient tumors, and adjacent normal tissues from the TCGA and UCSC Xena Data Hub public databases. We followed a workflow which involved the following steps: i) data acquisition, processing, and visualization followed by ii) developed of a predictive prognostic model using input (gene expression data) and output (survival time) parameters iii) model interpretation was performed by calculating SHAP (Shapely-Additive-exPlanations); iv) the application of the model involved a Cox-regression model, trained with L-2 regularization and optimization using 5 fold cross validation. The model identified gene signatures associated with TIGIT that predicted survival outcome with a test set with a score of 0.601. In summary, we have utilized this case study of TIGIT-mediated signaling pathways to develop a roadmap for biologists to harness ML methods effectively.
Publisher
Cold Spring Harbor Laboratory