Interpretable Machine Learning Leverages Proteomics to Improve Cardiovascular Disease Risk Prediction and Biomarker Identification-Reference-Cited by-同舟云学术

Interpretable Machine Learning Leverages Proteomics to Improve Cardiovascular Disease Risk Prediction and Biomarker Identification

Published:2024-01-13 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Climente-González Héctor^ORCID,Oh Min^ORCID,Chajewska Urszula,Hosseini Roya,Mukherjee Sudipto,Gan Wei,Traylor Matthew^ORCID,Hu Sile^ORCID,Fatemifar Ghazaleh,Del Villar Paul Pangilinan,Vernet Erik^ORCID,Koelling Nils^ORCID,Du Liang,Abraham Robin,Li Chuan^ORCID,Howson Joanna M. M.^ORCID

Abstract

AbstractCardiovascular diseases (CVD), primarily coronary heart disease and stroke, rank amongst the leading causes of long-term disability and mortality. Providing accurate disease risk predictions and identifying genes associated with CVD are crucial for prevention, early intervention, and the development of novel medications.The recent availability of UK Biobank Proteomics data enables the investigation of the blood proteome and its association with a wide variety of diseases. We employed the Explainable Boosting Machine (EBM), an interpretable machine learning model, for CVD risk prediction. The EBM model using proteomics outperforms traditional clinical models with an AUROC of 0.767 and an AUPRC of 0.2405. Adding clinical features further improves the AUROC to 0.785 and the AUPRC to 0.2835. Our models demonstrate consistent performance across sexes and ethnicities.While most prior studies using proteomics data for disease prediction have primarily focused on maximizing the accuracy at the population level, our model provides additional enriched insights into individualized disease risk predictions and in-depth biological insights into biomarkers. Our analysis also uncovers nonlinear risks linked to varying feature values. We further corroborate our findings using statistical approaches and evidence from the literature.In conclusion, we present a highly accurate and explanatory framework for proteomics data analysis, offering comprehensive and in-depth molecular and clinical insights. Our findings support future approaches that prioritize individualized disease risk prediction and the identification of target genes for drug development.

Publisher

Cold Spring Harbor Laboratory

Reference40 articles.

1. GDF-15 as a Target and Biomarker for Diabetes and Cardiovascular Diseases: A Translational Prospective;J Diabetes Res,2015

2. Revolutionizing cardiovascular risk prediction in patients with chronic kidney disease: machine learning and large-scale proteomic risk prediction model lead the way;Eur Heart J,2023

3. Bergstra, J. , Yamins, D. , & Cox, D. (2013). Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. Proc. of the 30th International Conference on Machine Learning (ICML 2013).

4. The UK Biobank resource with deep phenotyping and genomic data

5. BNP and NT-proBNP as Diagnostic Biomarkers for Cardiac Dysfunction in Both Clinical and Forensic Medicine;Int J Mol Sci,2019