Interpretable Machine Learning Leverages Proteomics to Improve Cardiovascular Disease Risk Prediction and Biomarker Identification

Author:

Climente-González HéctorORCID,Oh MinORCID,Chajewska Urszula,Hosseini Roya,Mukherjee Sudipto,Gan Wei,Traylor MatthewORCID,Hu SileORCID,Fatemifar Ghazaleh,Del Villar Paul Pangilinan,Vernet ErikORCID,Koelling NilsORCID,Du Liang,Abraham Robin,Li ChuanORCID,Howson Joanna M. M.ORCID

Abstract

AbstractCardiovascular diseases (CVD), primarily coronary heart disease and stroke, rank amongst the leading causes of long-term disability and mortality. Providing accurate disease risk predictions and identifying genes associated with CVD are crucial for prevention, early intervention, and the development of novel medications.The recent availability of UK Biobank Proteomics data enables the investigation of the blood proteome and its association with a wide variety of diseases. We employed the Explainable Boosting Machine (EBM), an interpretable machine learning model, for CVD risk prediction. The EBM model using proteomics outperforms traditional clinical models with an AUROC of 0.767 and an AUPRC of 0.2405. Adding clinical features further improves the AUROC to 0.785 and the AUPRC to 0.2835. Our models demonstrate consistent performance across sexes and ethnicities.While most prior studies using proteomics data for disease prediction have primarily focused on maximizing the accuracy at the population level, our model provides additional enriched insights into individualized disease risk predictions and in-depth biological insights into biomarkers. Our analysis also uncovers nonlinear risks linked to varying feature values. We further corroborate our findings using statistical approaches and evidence from the literature.In conclusion, we present a highly accurate and explanatory framework for proteomics data analysis, offering comprehensive and in-depth molecular and clinical insights. Our findings support future approaches that prioritize individualized disease risk prediction and the identification of target genes for drug development.

Publisher

Cold Spring Harbor Laboratory

Reference40 articles.

1. GDF-15 as a Target and Biomarker for Diabetes and Cardiovascular Diseases: A Translational Prospective;J Diabetes Res,2015

2. Revolutionizing cardiovascular risk prediction in patients with chronic kidney disease: machine learning and large-scale proteomic risk prediction model lead the way;Eur Heart J,2023

3. Bergstra, J. , Yamins, D. , & Cox, D. (2013). Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. Proc. of the 30th International Conference on Machine Learning (ICML 2013).

4. The UK Biobank resource with deep phenotyping and genomic data

5. BNP and NT-proBNP as Diagnostic Biomarkers for Cardiac Dysfunction in Both Clinical and Forensic Medicine;Int J Mol Sci,2019

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3