Advancing Alzheimer's Disease Risk Prediction: Development and Validation of a Machine Learning-Based Preclinical Screening Model (Preprint)-Reference-Cited by-同舟云学术

Advancing Alzheimer's Disease Risk Prediction: Development and Validation of a Machine Learning-Based Preclinical Screening Model (Preprint)

Published:2023-12-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Cao Shihua^ORCID,wang bingsheng^ORCID,shi yankai^ORCID,Yao Jiani,Lou Xiajing^ORCID,He Danni^ORCID,Chen Yanfei^ORCID,Qi Wenhao^ORCID,Wang Bing,Dong Chaoqun,Dong Chaoqun,Zhu Xiaohong,Shi Aili,Cheng Lingling

Abstract

BACKGROUND

Alzheimer's disease (AD) poses a significant challenge for individuals aged 65 and older, being the most prevalent form of dementia. Most existing Alzheimer's disease risk prediction tools have high accuracy, but the complexity and limited accessibility of current AD risk prediction tools hinder their practical use.

OBJECTIVE

Our goal was to leverage machine learning techniques to develop a prediction model that is not only highly efficient but also cost-effective.

METHODS

Utilizing data from 2,968 individuals sourced from the National Alzheimer's Coordinating Center, and we constructed models, including gradient-enhanced machines and random forests, as well as commonly used logistic regression models. For modeling purposes, we employed two popular machine learning algorithms, Random Forest and XGBoost, along with traditional logistic regression methods. The models' performance was evaluated based on five key criteria: the Brier score, accuracy (ACC), specificity (SPE), sensitivity (SEN), and area under the receiver operating characteristic curve (AUC).

RESULTS

The average age of the 2968 participants was 71.1 years, with a standard deviation of 6.8 years, and 60.3% were female. The prevalence of AD was 23.15% (n= 687). The machine learning-based Boruta algorithm identified 16 significant predictors from 33 potential risk factors, with a minimum Root mean squared error (RMSE) of 0.27 when the top 5 variables were selected (education level, depression, rapid eye movement sleep disorder, age, anxiety).We used the SHAP feature in the Gradient Boosting Decision Tree Model importance to rank the top 20 significant predictors and selected the top 4 variables: education level, age, marital status, and depression to construct our model based on cross-validation results. Compared to the logistic regression model, the integrated algorithm XGBoost and the random forest model performed better. Notably, XGBoost outperformed other models, achieving an AUC score of 0.78, ACC score of 0.691, SPE score of 0.677, SEN score of 0.739, PRE score of 0.403, and Brier score of 0.140.

CONCLUSIONS

Individual characteristics and psychological status are more critical than past history. Machine-learning-based AD risk assessment tools for older adults can be easily accessed and show some accurate discrimination, which may be useful in guiding preclinical screening for AD in the elderly population.

Publisher

JMIR Publications Inc.

Reference40 articles.

1. Neuronal autophagy and neurodegenerative diseases

2. Sociodemographic and lifestyle risk factors for incident dementia and cognitive decline in the HYVET

3. Building the evidence for an ecological model of cognitive health

4. 2022 Alzheimer's disease facts and figures

5. Revised NIA-AA criteria for the diagnosis of Alzheimer's disease: a step forward but not yet ready for widespread clinical use