A comparative study of model-centric and data-centric approaches in the development of cardiovascular disease risk prediction models in the UK Biobank

Author:

Mamouei Mohammad12ORCID,Fisher Thomas12,Rao Shishir12,Li Yikuan12,Salimi-Khorshidi Ghomalreza12,Rahimi Kazem123

Affiliation:

1. Deep Medicine, Oxford Martin School, University of Oxford , 1st Floor, Hayes House, 75 George Street, Oxford OX1 2BQ , UK

2. Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford , Oxford , UK

3. NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust , Oxford , UK

Abstract

AbstractAimsA diverse set of factors influence cardiovascular diseases (CVDs), but a systematic investigation of the interplay between these determinants and the contribution of each to CVD incidence prediction is largely missing from the literature. In this study, we leverage one of the most comprehensive biobanks worldwide, the UK Biobank, to investigate the contribution of different risk factor categories to more accurate incidence predictions in the overall population, by sex, different age groups, and ethnicity.Methods and resultsThe investigated categories include the history of medical events, behavioural factors, socioeconomic factors, environmental factors, and measurements. We included data from a cohort of 405 257 participants aged 37–73 years and trained various machine learning and deep learning models on different subsets of risk factors to predict CVD incidence. Each of the models was trained on the complete set of predictors and subsets where each category was excluded. The results were benchmarked against QRISK3. The findings highlight that (i) leveraging a more comprehensive medical history substantially improves model performance. Relative to QRISK3, the best performing models improved the discrimination by 3.78% and improved precision by 1.80%. (ii) Both model- and data-centric approaches are necessary to improve predictive performance. The benefits of using a comprehensive history of diseases were far more pronounced when a neural sequence model, BEHRT, was used. This highlights the importance of the temporality of medical events that existing clinical risk models fail to capture. (iii) Besides the history of diseases, socioeconomic factors and measurements had small but significant independent contributions to the predictive performance.ConclusionThese findings emphasize the need for considering broad determinants and novel modelling approaches to enhance CVD incidence prediction.

Funder

PEAK

UKRI

British Heart Foundation

BHF

Oxford NIHR Biomedical Research Centre

Oxford Martin School

University of Oxford

Novo Nordisk

OMS

NIHR

Publisher

Oxford University Press (OUP)

Subject

Energy Engineering and Power Technology,Fuel Technology

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3