Extracting Self-Reported COVID-19 Symptom Tweets and Twitter Movement Mobility Origin/Destination Matrices to Inform Disease Models


Rosato Conor1ORCID,Moore Robert E.1,Carter Matthew1ORCID,Heap John2,Harris John3ORCID,Storopoli Jose4ORCID,Maskell Simon1ORCID


1. Department of Electrical Engineering and Electronics, University of Liverpool, Liverpool L69 3GJ, UK

2. Computational Biology Facility, University of Liverpool, Liverpool L69 3GJ, UK

3. Public Health England, London NW9 5EQ, UK

4. Department of Computer Science, Universidade Nove de Julho—UNINOVE, Sao Paulo 03155-000, Brazil


The emergence of the novel coronavirus (COVID-19) generated a need to quickly and accurately assemble up-to-date information related to its spread. In this research article, we propose two methods in which Twitter is useful when modelling the spread of COVID-19: (1) machine learning algorithms trained in English, Spanish, German, Portuguese and Italian are used to identify symptomatic individuals derived from Twitter. Using the geo-location attached to each tweet, we map users to a geographic location to produce a time-series of potential symptomatic individuals. We calibrate an extended SEIRD epidemiological model with combinations of low-latency data feeds, including the symptomatic tweets, with death data and infer the parameters of the model. We then evaluate the usefulness of the data feeds when making predictions of daily deaths in 50 US States, 16 Latin American countries, 2 European countries and 7 NHS (National Health Service) regions in the UK. We show that using symptomatic tweets can result in a 6% and 17% increase in mean squared error accuracy, on average, when predicting COVID-19 deaths in US States and the rest of the world, respectively, compared to using solely death data. (2) Origin/destination (O/D) matrices, for movements between seven NHS regions, are constructed by determining when a user has tweeted twice in a 24 h period in two different locations. We show that increasing and decreasing a social connectivity parameter within an SIR model affects the rate of spread of a disease.



ESRC Centre for Doctoral Training on Quantification and Management of Risk and Uncertainty in Complex Systems Environments


EPSRC Centre for Doctoral Training in Distributed Algorithms

EPSRC through the Big Hypotheses




Information Systems

