BACKGROUND
Efforts to prevent and treat substance use disorders require real-time information on substance use trends. Social media have the potential to ameliorate weaknesses of surveys, electronic health records, and other data sources. Social media data are available in real time, unstructured (enabling measurement of novel topics), and often anonymous (increasing openness to discuss substance use).
OBJECTIVE
The aim of this study is twofold. First, we compare trends in discussion of four substance use topics on the social media platform Reddit to ground truth trends in their prevalence. Second, we measure and interpret trends for several emergent topics for which ground truth data are unavailable.
METHODS
We downloaded all posts from the r/opiates and r/OpiatesRecovery subreddits from 2016 to 2020. We measured trends in discussion of four topics—fentanyl, oxycodone, and benzodiazepine overdose, and kratom use—using a word embedding–based keyword expansion tool. We sourced ground truth data on the prevalence of these topics from administrative data systems. We calculated Pearson correlations between trends in Reddit discussions and trends in ground truth prevalence. Finally, we measured Reddit discussion trends for five topics of emerging importance in opioid use treatment: gabapentin use and the potentiation of opioids with grapefruit juice, fatty foods, diphenhydramine, and cimetidine.
RESULTS
Trends in discussion of fentanyl and benzodiazepine overdose and kratom use correlated strongly with ground truth trends in terms of the proportion of comments discussing the topic (r > 0.9) and the proportion of posts discussing the topic (r > 0.7). For oxycodone overdose, the ground truth trend correlated inversely with comment and post trends (r < -0.4). For all topics, the proportion of unique authors discussing the topic differed from comments and posts in its correlation with ground truth. The proportion of comments and posts discussing potentiation with grapefruit juice, fatty foods, and diphenhydramine increased by up to 50%, and the proportion discussing gabapentin increased by over 150%. However, the proportion of unique authors discussing these topics stayed flat or decreased.
CONCLUSIONS
These results show both the promise and the risk of using social media to surveil substance use. Although the connection between online discussion and real-world prevalence is evident in many strong correlations, the relationship varied depending on the measure of discussion. Furthermore, discussions about oxycodone overdose increased while real-world oxycodone overdoses decreased. Real-world prevalence is one of many factors that could influence online discussion frequency. Therefore, we discourage data mining and the interpretation of random trends as indicative of real-world prevalence. Nevertheless, as in the examples of gabapentin and opioid potentiators, online discussions can provide valuable information when tied to a specific hypothesis and measured holistically.