Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data
-
Published:2024-08-02
Issue:1
Volume:10
Page:
-
ISSN:2056-7189
-
Container-title:npj Systems Biology and Applications
-
language:en
-
Short-container-title:npj Syst Biol Appl
Author:
van Hilten ArnoORCID, van Rooij Jeroen, Heijmans Bastiaan T., ’t Hoen Peter A. C., Meurs Joyce van, Jansen Rick, Franke Lude, Boomsma Dorret I., Pool René, van Dongen Jenny, Hottenga Jouke J., van Greevenbroek Marleen M. J., Stehouwer Coen D. A., van der Kallen Carla J. H., Schalkwijk Casper G., Wijmenga Cisca, Zhernakova Sasha, Tigchelaar Ettje F., Slagboom P. Eline, Beekman Marian, Deelen Joris, van Heemst Diana, Veldink Jan H., van den Berg Leonard H., van Duijn Cornelia M., Hofman Bert A., Isaacs Aaron, Uitterlinden André G., Jhamai P. Mila, Verbiest Michael, Suchiman H. Eka D., Verkerk Marijn, van der Breggen Ruud, van Rooij Jeroen, Lakenberg Nico, Mei Hailiang, van Iterson Maarten, van Galen Michiel, Bot Jan, van ’t Hof Peter, Deelen Patrick, Nooren Irene, Moed Matthijs, Vermaat Martijn, Luijk René, Jan Bonder Marc, van Dijk Freerk, Arindrarto Wibowo, Kielbasa Szymon M., Swertz Morris A., van Zwet Erik. W., Ikram M. ArfanORCID, Niessen Wiro J., van Meurs Joyce. B. J., Roshchupkin Gennady V.,
Abstract
AbstractIntegrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, Ntotal = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90–1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R2 of 0.07 (95% CI: 0.05–0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97–6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.
Funder
Nederlandse Organisatie voor Wetenschappelijk Onderzoek
Publisher
Springer Science and Business Media LLC
Reference50 articles.
1. Li, M. et al. EWAS Atlas: a curated knowledgebase of epigenome-wide association studies. Nucleic Acids Res. 47, D983–D988 (2019). 2. Mikeska, T. & Craig, J. M. DNA methylation biomarkers: cancer and beyond. Genes 5, 821–864 (2014). 3. Taryma-Leśniak, O., Sokolowska, K. E. & Wojdacz, T. K. Current status of development of methylation biomarkers for in vitro diagnostic IVD applications. Clin Epigenetics 12, 100 (2020). 4. Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biol. 18, 1–15 (2017). 5. Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A. & Kim, D. Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet. 16, 85–97 (2015).
|
|