Abstract
AbstractA recent meta-review on hypertension risk models detailed that the differences in data and study-setup have a large influence on performance, meaning model comparisons should be performed using the same study data. We compared five different machine learning algorithms and the externally developed Framingham risk model in predicting risk of incident hypertension using data from the Trøndelag Health Study. The dataset yieldedn= 23722 individuals withp= 17 features recorded at baseline before follow-up 11 years later. Individuals were without hypertension, diabetes, or history of CVD at baseline. Features included clinical measurements, serum markers, and questionnaire-based information on health and lifestyle. The included modelling algorithms varied in complexity from simpler linear predictors like logistic regression to the eXtreme Gradient Boosting algorithm. The other algorithms were Random Forest, Support Vector Machines, K-Nearest Neighbor. After selecting hyperparameters using cross-validation on a training set, we evaluated the models’ performance on discrimination, calibration, and clinical usefulness on a separate testing set using bootstrapping. Although the machine learning models displayed the best performance measures on average, the improvement from a logistic regression model fitted with elastic regularization was small. The externally developed Framingham risk model performed well on discrimination, but severely overestimated risk of incident hypertension on our data. After a simple recalibration, the Framingham risk model performed as well or even better than some of the newly developed models on all measures. Using the available data, this indicates that low-complexity models may suffice for long-term risk modelling. However, more studies are needed to assess potential benefits of a more diverse feature-set. This study marks the first attempt at applying machine learning methods and evaluating their performance on discrimination, calibration, and clinical usefulness within the same study on hypertension risk modelling.Author summaryHypertension, the state of persistent high blood pressure, is a largely symptom-free medical condition affecting millions of individuals worldwide, a number that is expected to rise in the coming years. While consequences of unchecked hypertension are severe, life-style modifications have been proven to be effective in prevention and treatment of hypertension. A possible tool for identifying individuals at risk of developing hypertension has been the creation of hypertension risk scores, which calculate a probability of incident hypertension sometime in the future. We compared applying machine learning as opposed to more traditional tools for constructing risk models on a large Norwegian cohort, measuring performance by model validity and clinical usefulness. Using easily obtainable clinical information and blood biomarkers as inputs, we found no clear advantage in performance using the machine learning models. Only a few of our included inputs, namely systolic and diastolic blood pressure, age, and BMI were found to be important for accurate prediction. This suggest more diverse information on individuals, like genetic, socio-economic, or dietary information, may be necessary for machine learning to excel over more established methods. A risk model developed using an American cohort, the Framingham risk model, performed well on our data after recalibration. Our study provides new insights into machine learning may be used to enhance hypertension risk prediction.
Publisher
Cold Spring Harbor Laboratory