Abstract
AbstractA crucial component of the treatment of genetic disorders is identifying and characterising the genes and gene modules that drive disease processes. Recent advances in Next-Generation Sequencing (NGS) improve the prospects for achieving this goal. However, many machine learning techniques are not explainable and fail to account for gene correlations. In this work, we develop a comprehensive set of explainable machine learning techniques to perform patient stratification for inflammatory bowel disease. We focus on Crohn’s disease (CD) and its subtypes: CD with deep ulcer, CD without deep ulcer and IBD-controls. We produce an interpretable probabilistic model over disease subtypes using Gaussian Mixture Modelling. We then apply class-contrastive and feature-attribution techniques to identify potential target genes and modules. We modify the widely used kernelSHAP (Shapley Additive Explanations) algorithm to account for gene correlations. We obtain relevant gene modules for each disease subtype. We develop a class-contrastive technique to visually explain why a particular patient is predicted to have a particular subtype of the disease. We show that our results are relevant to the disease through Gene Ontology enrichment analysis and a review of the literature. We also uncover some novel findings, including currently uncharacterised genes. These approaches maybe beneficial, in personalised medicine, to inform decision-making regarding the diagnosis and treatment of genetic disorders. Our approach is model-agnostic and can potentially be applied to other diseases and domains where explainability and feature correlations are important.
Publisher
Cold Spring Harbor Laboratory
Reference58 articles.
1. RNA-Seq: a revolutionary tool for transcriptomics
2. A review of the diagnosis, prevention, and treatment methods of inflammatory bowel disease;Journal of Medicine and Life,2019
3. An adaptive robust semi-supervised clustering framework using weighted consensus of random k-means ensemble;IEEE Transactions on Knowledge and Data Engineering,2021
4. Scott Lundberg and Su-In Lee . A unified approach to interpreting model predictions, 2017.
5. Cardinal. A class-contrastive human-interpretable machine learning approach to predict mortality in severe mental illness;npj Schizophrenia,2021