Enhancing COVID-19 Epidemics Forecasting Accuracy by Combining Real-time and Historical Data from Social Media, Online News Articles, and Search Queries (Preprint)

Author:

Li JingweiORCID,Huang Wei,Sia Choon Ling,Chen ZhuoORCID,Wu TailaiORCID,Wang Qingnan

Abstract

BACKGROUND

The SARS-COV-2 virus and its variants are posing extraordinary challenges for public health worldwide. More timely and accurate forecasting of COVID-19 epidemics is the key to maintaining timely interventions and policies and efficient resources allocation. Internet-based data sources have shown great potential to supplement traditional infectious disease surveillance, and the combination of different Internet-based data sources has shown greater power to enhance epidemic forecasting accuracy than using a single Internet-based data source. However, existing methods incorporating multiple Internet-based data sources only used real-time data from these sources as exogenous inputs, but didn’t take all the historical data into account. Moreover, the predictive power of different Internet-based data sources in providing early warning for COVID-19 outbreaks has not been fully explored.

OBJECTIVE

The main aim of our study is to explore whether combining real-time and historical data from multiple Internet-based sources could improve the COVID-19 forecasting accuracy over the existing baseline models. A secondary aim is to explore the COVID-19 forecasting timeliness based on different Internet-based data sources.

METHODS

We first used core terms and symptoms related keywords-based methods to extract COVID-19 related Internet-based data from December 21, 2019, to February 29, 2020. The Internet-based data we explored included 90,493,912 online news articles, 37,401,900 microblogs, and all the Baidu search query data during that period. We then proposed an autoregressive model with exogenous inputs, incorporating the real-time and historical data from multiple Internet-based sources. Our proposed model was compared with baseline models, and all the models were tested during the first wave of COVID-19 epidemics in Hubei province and the rest of mainland China separately. We also used the lagged Pearson correlations for the COVID-19 forecasting timeliness analysis.

RESULTS

Our proposed model achieved the highest accuracy in all the five accuracy measures, compared with all the baseline models in both Hubei province and the rest of mainland China. In mainland China except Hubei, the COVID-19 epidemics forecasting accuracy differences between our proposed model (model i) and all the other baseline models were statistically significant (model 1, t=–8.722, P<.001; model 2, t=–5.000, P<.001, model 3, t=–1.882, P =0.063, model 4, t=–4.644, P<.001; model 5, t=–4.488, P<.001). In Hubei province, our proposed model's forecasting accuracy improved significantly compared with the baseline model using historical COVID-19 new confirmed case counts only (model 1, t=–1.732, P=0.086). Our results also showed that Internet-based sources could provide a 2-6 days earlier warning for COVID-19 outbreaks.

CONCLUSIONS

Our approach incorporating real-time and historical data from multiple Internet-based sources could improve forecasting accuracy for COVID-19 epidemics and its variants, which may help improve public health agencies' interventions and resources allocation in mitigating and controlling new waves of COVID-19 or other epidemics.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3