BACKGROUND
Much research is being done using publicly available Twitter data in the field of public health, but what types of research questions these data are being used to answer and the extent to which these projects require ethical oversight is not clear.
OBJECTIVE
To describe the current state of public health research using Twitter data in terms of methods/research questions, geographic focus, and ethical considerations including informed consent of Twitter handlers.
METHODS
We implemented a systematic review, following PRISMA guidelines, of articles published between January 2006 and October 31, 2019 using Twitter data in secondary analyses for public health research found using standardized search criteria on SOCIndex, PsychInfo, and/or PubMed. Studies were excluded when using Twitter for primary data collection, such as for study recruitment or as part of a dissemination intervention.
RESULTS
We identified 367 articles that met eligibility criteria. Infectious disease (21.8%) and substance use (18.0%) were the most common topics for these studies, and sentiment mining (61.9%), surveillance (61.0%), and thematic exploration (59.1%) were the most common methodologies employed. About one-third of articles had a global/worldwide geographic focus; another third focused on the United States. The majority (60.5%) of articles used a native Twitter application programming interface (API), and a significant amount of the remainder (27.8%) used a third-party API. Only one third (32.3%) of studies sought IRB approval, while 16.9% included identifying information on Twitter users and/or tweets and 35.7% attempted to anonymize identifiers. Most studies included discussion of the validity of the measures (73.6%) and reliability of coding (69.7% for inter-reliability of human coding and 70.2% for computer algorithm checks), but less attention was paid to the sampling frame, and what underlying population the sample represented.
CONCLUSIONS
Twitter data may be useful in public health research, given its access to publicly available information. However, studies should exercise greater caution in considering the data sources, accession method, and external validity of the sampling frame. Further, an ethical framework is necessary to help guide future research in this area, especially when individual, identifiable Twitter users and tweets are shared and discussed.