Author:
Wang Ligui,Liu Yuqi,Chen Hui,Qiu Shaofu,Liu Yonghong,Yang Mingjuan,Du Xinying,Li Zhenjun,Hao Rongzhang,Tian Huaiyu,Song Hongbin
Abstract
AbstractThe search-engine-based surveillance methods for the early warning and prediction of infectious diseases cannot achieve search engine keywords automatic filtering and real-time updating, lead to powerless for the early warning of emerging infectious diseases. The aim of this study is to develop an artificial intelligence (AI) method for search-engine-based surveillance to improve the early warning ability for emerging infectious diseases. The 32 keywords (444 million search queries) that may be related to the coronavirus disease (COVID-19) outbreak was collected from December 18, 2019 to February 11, 2020 from Baidu’s search engine database. The graph convolution network (GCN) model was used to select search engine keywords automatically, and then, multiple linear regression was performed to explore the relationship between the daily query frequencies of keywords and daily new cases. The GCN model was used to automatically select keywords. The prediction trend of the GCN model was highly consistent with the true curve with a mean absolute error of 81.65. Three keywords including “epidemic”, “mask” and “coronavirus” were selected. The selection keywords in the search queries were highly correlated with the daily number of confirmed cases (r = 0.96, 0.94, and 0.89; P < 0.01). An abnormal initial peak (3.05 times the normal volume) in queries appeared on December 31, 2019, which could have served as an early warning signal for an outbreak. Of particular concern, 17.5% of query volume originated from the Hubei Province, 51.15% of which was from Wuhan City. The coefficients of determination (R2) of our constructed model were 0.88, 0.88, 0.84, 0.77, 0.77, 0.75, 0.73, and 0.73 for a time lag of 0–7 days, respectively, using selection keywords. The model we constructed was used in the Beijing Xinfadi outbreak as an independent test dataset, which successfully predicted the daily numbers of cases for the following days and detected an early signal during the Beijing Xinfadi outbreak (R2 = 0.79). In this paper search-engine-based surveillance based on the AI method was established for the early detection of the COVID-19 epidemic for the first time. The model achieves automatic filtering and real-time updating of search engine keywords and can effectively detect the early signals of emerging infectious diseases.
Funder
National Key R&D Program of China
Publisher
Springer Science and Business Media LLC
Subject
Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems