Predicting the incidence of infectious diarrhea with symptom surveillance data using a stacking-based ensembled model-Reference-Cited by-同舟云学术

Predicting the incidence of infectious diarrhea with symptom surveillance data using a stacking-based ensembled model

Published:2024-02-26 Issue:1 Volume:24 Page:
ISSN:1471-2334
Container-title:BMC Infectious Diseases
language:en
Short-container-title:BMC Infect Dis

Author:

Wang Pengyu,Zhang Wangjian,Wang Hui,Shi Congxing,Li Zhiqiang,Wang Dahu,Luo Lei,Du Zhicheng^ORCID,Hao Yuantao

Abstract

Abstract Background Infectious diarrhea remains a major public health problem worldwide. This study used stacking ensemble to developed a predictive model for the incidence of infectious diarrhea, aiming to achieve better prediction performance. Methods Based on the surveillance data of infectious diarrhea cases, relevant symptoms and meteorological factors of Guangzhou from 2016 to 2021, we developed four base prediction models using artificial neural networks (ANN), Long Short-Term Memory networks (LSTM), support vector regression (SVR) and extreme gradient boosting regression trees (XGBoost), which were then ensembled using stacking to obtain the final prediction model. All the models were evaluated with three metrics: mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE). Results Base models that incorporated symptom surveillance data and weekly number of infectious diarrhea cases were able to achieve lower RMSEs, MAEs, and MAPEs than models that added meteorological data and weekly number of infectious diarrhea cases. The LSTM had the best prediction performance among the four base models, and its RMSE, MAE, and MAPE were: 84.85, 57.50 and 15.92%, respectively. The stacking ensembled model outperformed the four base models, whose RMSE, MAE, and MAPE were 75.82, 55.93, and 15.70%, respectively. Conclusions The incorporation of symptom surveillance data could improve the predictive accuracy of infectious diarrhea prediction models, and symptom surveillance data was more effective than meteorological data in enhancing model performance. Using stacking to combine multiple prediction models were able to alleviate the difficulty in selecting the optimal model, and could obtain a model with better performance than base models.

Funder

Science and Technology Program of Guangzhou, China

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s12879-024-09138-x.pdf

Reference46 articles.

1. Abbafati C, Abbas KM, Abbasi M, Abbasifard M, Abbasi-Kangevari M, et al. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the global burden of Disease Study 2019. Lancet. 2020;396(10258):1204–22.

2. Chen C, Guan Z, Huang CY, Jiang DX, Liu XX, et al. Epidemiological trends and hotspots of other infectious diarrhea (OID) in Mainland China: a Population-based Surveillance Study from 2004 to 2017. Front Public Health. 2021;9. https://doi.org/10.3389/fpubh.2021.679853.

3. Wang Yongming J, Li J, Gu Z, Zhou, Wang Z. Artificial neural networks for infectious diarrhea prediction using meteorological factors in Shanghai (China). Appl Soft Comput. 2015;35:280–90. https://doi.org/10.1016/j.asoc.2015.05.047.

4. Fang XY, Liu WD, Ai J, He MK, Wu Y, et al. Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China. BMC Infect Dis. 2020;20(1):8. https://doi.org/10.1186/s12879-020-4930-2.

5. Berry AC. Syndromic surveillance and its utilisation for mass gatherings. Epidemiol Infect. 2019;147. https://doi.org/10.1017/s0950268818001735.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Managing the risks against carbon neutralization for green maritime transport;Journal of Cleaner Production;2024-06