Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data-Reference-Cited by-同舟云学术

Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data

Published:2024-05-02 Issue:1 Volume:24 Page:
ISSN:1472-6947
Container-title:BMC Medical Informatics and Decision Making
language:en
Short-container-title:BMC Med Inform Decis Mak

Author:

Drouard Gabin,Mykkänen Juha,Heiskanen Jarkko,Pohjonen Joona,Ruohonen Saku,Pahkala Katja,Lehtimäki Terho,Wang Xiaoling,Ollikainen Miina,Ripatti Samuli,Pirinen Matti,Raitakari Olli,Kaprio Jaakko

Abstract

Abstract Background Machine learning (ML) classifiers are increasingly used for predicting cardiovascular disease (CVD) and related risk factors using omics data, although these outcomes often exhibit categorical nature and class imbalances. However, little is known about which ML classifier, omics data, or upstream dimension reduction strategy has the strongest influence on prediction quality in such settings. Our study aimed to illustrate and compare different machine learning strategies to predict CVD risk factors under different scenarios. Methods We compared the use of six ML classifiers in predicting CVD risk factors using blood-derived metabolomics, epigenetics and transcriptomics data. Upstream omic dimension reduction was performed using either unsupervised or semi-supervised autoencoders, whose downstream ML classifier performance we compared. CVD risk factors included systolic and diastolic blood pressure measurements and ultrasound-based biomarkers of left ventricular diastolic dysfunction (LVDD; E/e' ratio, E/A ratio, LAVI) collected from 1,249 Finnish participants, of which 80% were used for model fitting. We predicted individuals with low, high or average levels of CVD risk factors, the latter class being the most common. We constructed multi-omic predictions using a meta-learner that weighted single-omic predictions. Model performance comparisons were based on the F1 score. Finally, we investigated whether learned omic representations from pre-trained semi-supervised autoencoders could improve outcome prediction in an external cohort using transfer learning. Results Depending on the ML classifier or omic used, the quality of single-omic predictions varied. Multi-omics predictions outperformed single-omics predictions in most cases, particularly in the prediction of individuals with high or low CVD risk factor levels. Semi-supervised autoencoders improved downstream predictions compared to the use of unsupervised autoencoders. In addition, median gains in Area Under the Curve by transfer learning compared to modelling from scratch ranged from 0.09 to 0.14 and 0.07 to 0.11 units for transcriptomic and metabolomic data, respectively. Conclusions By illustrating the use of different machine learning strategies in different scenarios, our study provides a platform for researchers to evaluate how the choice of omics, ML classifiers, and dimension reduction can influence the quality of CVD risk factor predictions.

Funder

University of Helsinki

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s12911-024-02521-3.pdf

Reference57 articles.

1. Roth GA, Mensah GA, Johnson CO, et al. Global burden of cardiovascular diseases and risk factors, 1990–2019: update from the GBD 2019 study. J Am Coll Cardiol. 2020;76(25):2982–3021.

2. van der Harst P, Verweij N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ Res. 2018;122(3):433–43.

3. Shah S, Henry A, Roselli C, et al. Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure. Nat Commun. 2020;11(1):163.

4. Leon-Mimila P, Wang J, Huertas-Vazquez A. Relevance of multi-omics studies in cardiovascular diseases. Front Cardiovasc Med. 2019;6:91.

5. Joshi A, Rienks M, Theofilatos K, Mayr M. Systems biology in cardiovascular disease: a multiomics approach. Nat Rev Cardiol. 2021;18(5):313–30.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Processing imbalanced medical data at the data level with assisted-reproduction data as an example;BioData Mining;2024-09-04