Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model

Author:

Lv Cai-Xia,An Shu-Yi,Qiao Bao-Jun,Wu Wei

Abstract

Abstract Background Hemorrhagic fever with renal syndrome (HFRS) is still attracting public attention because of its outbreak in various cities in China. Predicting future outbreaks or epidemics disease based on past incidence data can help health departments take targeted measures to prevent diseases in advance. In this study, we propose a multistep prediction strategy based on extreme gradient boosting (XGBoost) for HFRS as an extension of the one-step prediction model. Moreover, the fitting and prediction accuracy of the XGBoost model will be compared with the autoregressive integrated moving average (ARIMA) model by different evaluation indicators. Methods We collected HFRS incidence data from 2004 to 2018 of mainland China. The data from 2004 to 2017 were divided into training sets to establish the seasonal ARIMA model and XGBoost model, while the 2018 data were used to test the prediction performance. In the multistep XGBoost forecasting model, one-hot encoding was used to handle seasonal features. Furthermore, a series of evaluation indices were performed to evaluate the accuracy of the multistep forecast XGBoost model. Results There were 200,237 HFRS cases in China from 2004 to 2018. A long-term downward trend and bimodal seasonality were identified in the original time series. According to the minimum corrected akaike information criterion (CAIC) value, the optimal ARIMA (3, 1, 0) × (1, 1, 0)12 model is selected. The index ME, RMSE, MAE, MPE, MAPE, and MASE indices of the XGBoost model were higher than those of the ARIMA model in the fitting part, whereas the RMSE of the XGBoost model was lower. The prediction performance evaluation indicators (MAE, MPE, MAPE, RMSE and MASE) of the one-step prediction and multistep prediction XGBoost model were all notably lower than those of the ARIMA model. Conclusions The multistep XGBoost prediction model showed a much better prediction accuracy and model stability than the multistep ARIMA prediction model. The XGBoost model performed better in predicting complicated and nonlinear data like HFRS. Additionally, Multistep prediction models are more practical than one-step prediction models in forecasting infectious diseases.

Funder

National Natural Science Foundation of China

Department of Education of Liaoning Province

Publisher

Springer Science and Business Media LLC

Subject

Infectious Diseases

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3