BACKGROUND
Social media and online discussion forums offer a unique data source for medical and public health research. Using these platforms, people who use drugs often discuss valuable information including adverse effects, formulations, and reasons for use.
OBJECTIVE
Since this data is often unstructured, text and data mining methods are required to extract and analyze these posts systematically. This scoping review summarizes the literature on text and data mining methods for online substance use content.
METHODS
Online databases including PubMed and EMBASE were searched to identify articles meeting the eligibility criteria. Titles and abstracts were first screened by two reviewers and any conflicts were resolved with discussion. Data extraction was performed by two reviewers using an identical template to record information. Any disagreements were resolved with discussion.
RESULTS
The search identified 1131 articles, 26 of which were included for data extraction. Most articles presented unique data mining methods. The five most common strategies included sentiment analysis, topic modeling, data classification, clustering, and association learning.
CONCLUSIONS
Data mining offers a valuable avenue for retrieving useful information from online discussion forums to supplement conventional data sources in medical and public health research. With respect to substance use content, association learning and regression analysis were particularly well-suited for analyzing this data. Future research should focus on confirming the validity and reliability of these data mining methods, while establishing links between data mining, clinical evaluation, and knowledge translation.