Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks

Author:

Becker Ann-KristinORCID,Ittermann TillORCID,Dörr MarkusORCID,Felix Stephan B.,Nauck MatthiasORCID,Teumer AlexanderORCID,Völker UweORCID,Völzke Henry,Kaderali LarsORCID,Nath NeetikaORCID

Abstract

Background Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability. Method We here propose a machine learning workflow that combines random forests with Bayesian network surrogate models to allow for a deeper level of interpretation of complex association patterns. We first evaluate the proposed workflow on synthetic data. We then apply it to data from the large population-based Study of Health in Pomerania (SHIP). Based on this combination, we discover and interpret broad patterns of individual serum TSH concentrations, an important marker of thyroid functionality. Results Evaluations using simulated data show that feature associations can be correctly recovered by combining random forests and Bayesian networks. The presented model achieves predictive accuracy that is similar to state-of-the-art models (root mean square error of 0.66, mean absolute error of 0.55, coefficient of determination of R2 = 0.15). We identify 62 relevant features from the final random forest model, ranging from general health variables over dietary and genetic factors to physiological, hematological and hemostasis parameters. The Bayesian network model is used to put these features into context and make the black-box random forest model more understandable. Conclusion We demonstrate that the combination of random forest and Bayesian network analysis is helpful to reveal and interpret broad association patterns of individual TSH concentrations. The discovered patterns are in line with state-of-the-art literature. They may be useful for future thyroid research and improved dosing of therapeutics.

Funder

bundesministerium für bildung und forschung

horizon 2020 framework programme

volkswagen foundation

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference51 articles.

1. Random forests;L. Breiman;Mach Learn,2001

2. Cohort profile: The study of health in Pomerania;H Völzke;Int J Epidemiol,2011

3. The incidence and prevalence of thyroid dysfunction in Europe: A meta-analysis;AG Madariaga;J Clin Endocrinol Metab,2014

4. Global epidemiology of hyperthyroidism and hypothyroidism;PN Taylor;Nat Rev Endocrinol,2018

5. The clinical significance of subclinical thyroid dysfunction;B Biondi;Endocrine Reviews,2008

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3