Enhancing COVID-19 Epidemics Forecasting Accuracy by Combining Real-time and Historical Data from Social Media, Online News Articles, and Search Queries (Preprint)-Reference-Cited by-同舟云学术

Enhancing COVID-19 Epidemics Forecasting Accuracy by Combining Real-time and Historical Data from Social Media, Online News Articles, and Search Queries (Preprint)

Published:2021-11-29 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Li Jingwei^ORCID,Huang Wei,Sia Choon Ling,Chen Zhuo^ORCID,Wu Tailai^ORCID,Wang Qingnan

Abstract

BACKGROUND

The SARS-COV-2 virus and its variants are posing extraordinary challenges for public health worldwide. More timely and accurate forecasting of COVID-19 epidemics is the key to maintaining timely interventions and policies and efficient resources allocation. Internet-based data sources have shown great potential to supplement traditional infectious disease surveillance, and the combination of different Internet-based data sources has shown greater power to enhance epidemic forecasting accuracy than using a single Internet-based data source. However, existing methods incorporating multiple Internet-based data sources only used real-time data from these sources as exogenous inputs, but didn’t take all the historical data into account. Moreover, the predictive power of different Internet-based data sources in providing early warning for COVID-19 outbreaks has not been fully explored.

OBJECTIVE

The main aim of our study is to explore whether combining real-time and historical data from multiple Internet-based sources could improve the COVID-19 forecasting accuracy over the existing baseline models. A secondary aim is to explore the COVID-19 forecasting timeliness based on different Internet-based data sources.

METHODS

We first used core terms and symptoms related keywords-based methods to extract COVID-19 related Internet-based data from December 21, 2019, to February 29, 2020. The Internet-based data we explored included 90,493,912 online news articles, 37,401,900 microblogs, and all the Baidu search query data during that period. We then proposed an autoregressive model with exogenous inputs, incorporating the real-time and historical data from multiple Internet-based sources. Our proposed model was compared with baseline models, and all the models were tested during the first wave of COVID-19 epidemics in Hubei province and the rest of mainland China separately. We also used the lagged Pearson correlations for the COVID-19 forecasting timeliness analysis.

RESULTS

Our proposed model achieved the highest accuracy in all the five accuracy measures, compared with all the baseline models in both Hubei province and the rest of mainland China. In mainland China except Hubei, the COVID-19 epidemics forecasting accuracy differences between our proposed model (model i) and all the other baseline models were statistically significant (model 1, t=–8.722, P<.001; model 2, t=–5.000, P<.001, model 3, t=–1.882, P =0.063, model 4, t=–4.644, P<.001; model 5, t=–4.488, P<.001). In Hubei province, our proposed model's forecasting accuracy improved significantly compared with the baseline model using historical COVID-19 new confirmed case counts only (model 1, t=–1.732, P=0.086). Our results also showed that Internet-based sources could provide a 2-6 days earlier warning for COVID-19 outbreaks.

CONCLUSIONS

Our approach incorporating real-time and historical data from multiple Internet-based sources could improve forecasting accuracy for COVID-19 epidemics and its variants, which may help improve public health agencies' interventions and resources allocation in mitigating and controlling new waves of COVID-19 or other epidemics.

Publisher

JMIR Publications Inc.

Reference41 articles.

1. WHO Coronavirus Disease (COVID-19) Dashboard

2. Delta coronavirus variant: scientists brace for impact

3. Prediction of COVID-19 Waves Using Social Media and Google Search: A Case Study of the US and Canada

4. Trends and Prediction in Daily New Cases and Deaths of COVID-19 in the United States: An Internet Search-Interest Based Model

5. Real-Time Forecasting of the COVID-19 Outbreak in Chinese Provinces: Machine Learning Approach Using Novel Digital Data and Estimates From Mechanistic Models