Author:
Wang Yanran,Astrakhan Yuri,Petersen Britt-Sabina,Schreiber Stefan,Franke Andre,Bromberg Yana
Abstract
AbstractBackgroundAfter many years of concentrated research efforts, the exact cause of Crohn’s disease remains unknown. Its accurate diagnosis, however, helps in management and even preventing the onset of disease. Genome-wide association studies have identified 140 loci associated with CD, but these carry very small log odds ratios and are uninformative for diagnoses.ResultsHere we describe a machine learning method – AVA,Dx (Analysis of Variation for Association with Disease) – that uses whole exome sequencing data to make predictions of CD status. Using the person-specific variation in these genes from a panel of only 111 individuals, we built disease-prediction models informative of previously undiscovered disease genes. In this panel, our models differentiate CD patients from healthy controls with 71% precision and 73% recall at the default cutoff. By additionally accounting for batch effects, we are also able to predict individual CD status for previously unseen individuals from a separate CD study (84% precision, 73% recall).ConclusionsLarger training panels and additional features, including regulatory variants and environmental factors, e.g. human-associated microbiota, are expected to improve model performance. However, current results already position AVA,Dx as both an effective method for highlighting pathogenesis pathways and as a simple Crohn’s disease risk analysis tool, which can improve clinical diagnostic time and accuracy.
Publisher
Cold Spring Harbor Laboratory