Abstract
AbstractIntroductionWe aimed to estimate the rate of kidney function decline over 10 years in the general population and develop a machine learning model to predict it.MethodsWe used the JMDC database from 2012 to 2021, which includes company employees and their family members in Japan, where annual health checks are mandated for people aged 40–74 years. We estimated the slope (average change) of estimated glomerular filtration rate (eGFR) over a period of 10 years. Then, using the annual health-check results and prescription claims for the first five years from 2012 to 2016 as predictor variables, we developed an XGBoost model, evaluated its prediction performance with the root mean squared error (RMSE), R2, and area under the receiver operating characteristic curve (AUROC) for rapid decliners (defined as the slope <-3 ml/min/1.73 m2/year) using 5-fold cross validation, and compared these indicators with those of the linear regression model using only eGFR data from 2012 to 2016.ResultsWe included 126 424 individuals (mean age, 45.2 years; male, 82.4%; mean eGFR, 79.0 ml/min/1.73 m2in 2016). The mean slope was -0.89 (standard deviation, 0.96) ml/min/1.73 m2/year. The predictive performance of the XGBoost model (RMSE, 0.78; R2, 0.35; and AUROC, 0.89) was better than that of the linear regression model using only eGFR data (RMSE, 1.94; R2, -3.03; and AUROC, 0.79).ConclusionApplication of machine learning to annual health-check and claims data could predict the rate of kidney function decline, whereas the linear regression model using only eGFR data did not work.
Publisher
Cold Spring Harbor Laboratory