Multivariate longitudinal data for survival analysis of cardiovascular event prediction in young adults: insights from a comparative explainable study-Reference-Cited by-同舟云学术

Multivariate longitudinal data for survival analysis of cardiovascular event prediction in young adults: insights from a comparative explainable study

Published:2023-01-25 Issue:1 Volume:23 Page:
ISSN:1471-2288
Container-title:BMC Medical Research Methodology
language:en
Short-container-title:BMC Med Res Methodol

Author:

Nguyen Hieu T.,Vasconcellos Henrique D.,Keck Kimberley,Reis Jared P.,Lewis Cora E.,Sidney Steven,Lloyd-Jones Donald M.,Schreiner Pamela J.,Guallar Eliseo,Wu Colin O.,Lima João A.C.,Ambale-Venkatesh Bharath

Abstract

Abstract Background Multivariate longitudinal data are under-utilized for survival analysis compared to cross-sectional data (CS - data collected once across cohort). Particularly in cardiovascular risk prediction, despite available methods of longitudinal data analysis, the value of longitudinal information has not been established in terms of improved predictive accuracy and clinical applicability. Methods We investigated the value of longitudinal data over and above the use of cross-sectional data via 6 distinct modeling strategies from statistics, machine learning, and deep learning that incorporate repeated measures for survival analysis of the time-to-cardiovascular event in the Coronary Artery Risk Development in Young Adults (CARDIA) cohort. We then examined and compared the use of model-specific interpretability methods (Random Survival Forest Variable Importance) and model-agnostic methods (SHapley Additive exPlanation (SHAP) and Temporal Importance Model Explanation (TIME)) in cardiovascular risk prediction using the top-performing models. Results In a cohort of 3539 participants, longitudinal information from 35 variables that were repeatedly collected in 6 exam visits over 15 years improved subsequent long-term (17 years after) risk prediction by up to 8.3% in C-index compared to using baseline data (0.78 vs. 0.72), and up to approximately 4% compared to using the last observed CS data (0.75). Time-varying AUC was also higher in models using longitudinal data (0.86–0.87 at 5 years, 0.79–0.81 at 10 years) than using baseline or last observed CS data (0.80–0.86 at 5 years, 0.73–0.77 at 10 years). Comparative model interpretability analysis revealed the impact of longitudinal variables on model prediction on both the individual and global scales among different modeling strategies, as well as identifying the best time windows and best timing within that window for event prediction. The best strategy to incorporate longitudinal data for accuracy was time series massive feature extraction, and the easiest interpretable strategy was trajectory clustering. Conclusion Our analysis demonstrates the added value of longitudinal data in predictive accuracy and epidemiological utility in cardiovascular risk survival analysis in young adults via a unified, scalable framework that compares model performance and explainability. The framework can be extended to a larger number of variables and other longitudinal modeling methods. Trial registration ClinicalTrials.gov Identifier: NCT00005130, Registration Date: 26/05/2000.

Funder

National Institutes of Health

Publisher

Springer Science and Business Media LLC

Subject

Health Informatics,Epidemiology

Link

https://link.springer.com/content/pdf/10.1186/s12874-023-01845-4.pdf

Reference71 articles.

1. Goldstein BA, Navar AM, Pencina MJ, Ioannidis J. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc. 2017;24(1):198–208.

2. Yang L, Yu M, Gao S. Prediction of coronary artery disease risk based on multiple longitudinal biomarkers. Stat Med. 2016;35(8):1299–314.

3. Miller RG, Anderson SJ, Costacou T, Sekikawa A, Orchard TJ. Hemoglobin A1c level and cardiovascular disease incidence in persons with type 1 diabetes: an application of joint modeling of longitudinal and time-to-event data in the Pittsburgh Epidemiology of Diabetes Complications Study. Am J Epidemiol. 2018;187(7):1520–9.