Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data

Author:

van Hilten ArnoORCID,van Rooij Jeroen,Heijmans Bastiaan T.,’t Hoen Peter A. C.,Meurs Joyce van,Jansen Rick,Franke Lude,Boomsma Dorret I.,Pool René,van Dongen Jenny,Hottenga Jouke J.,van Greevenbroek Marleen M. J.,Stehouwer Coen D. A.,van der Kallen Carla J. H.,Schalkwijk Casper G.,Wijmenga Cisca,Zhernakova Sasha,Tigchelaar Ettje F.,Slagboom P. Eline,Beekman Marian,Deelen Joris,van Heemst Diana,Veldink Jan H.,van den Berg Leonard H.,van Duijn Cornelia M.,Hofman Bert A.,Isaacs Aaron,Uitterlinden André G.,Jhamai P. Mila,Verbiest Michael,Suchiman H. Eka D.,Verkerk Marijn,van der Breggen Ruud,van Rooij Jeroen,Lakenberg Nico,Mei Hailiang,van Iterson Maarten,van Galen Michiel,Bot Jan,van ’t Hof Peter,Deelen Patrick,Nooren Irene,Moed Matthijs,Vermaat Martijn,Luijk René,Jan Bonder Marc,van Dijk Freerk,Arindrarto Wibowo,Kielbasa Szymon M.,Swertz Morris A.,van Zwet Erik. W.,Ikram M. ArfanORCID,Niessen Wiro J.,van Meurs Joyce. B. J.,Roshchupkin Gennady V.,

Abstract

AbstractIntegrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, Ntotal = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90–1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R2 of 0.07 (95% CI: 0.05–0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97–6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.

Funder

Nederlandse Organisatie voor Wetenschappelijk Onderzoek

Publisher

Springer Science and Business Media LLC

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3