Examining the Gateway Hypothesis and Mapping Substance Use Pathways on Social Media: A Machine Learning Approach (Preprint)

Author:

Yuan YunhaoORCID,Kasson ErinORCID,Taylor JordanORCID,Cavazos-Rehg PatriciaORCID,De Choudhury MunmunORCID,Aledavood TalayehORCID

Abstract

BACKGROUND

Substance misuse presents significant global public health challenges. Understanding transitions between substance types and the timing of shifts to polysubstance use is vital for targeted prevention, harm reduction, and recovery strategies. The longstanding gateway hypothesis suggests high-risk substance use is preceded by lower-risk substance use. However, the source of this correlation is hotly contested. While some claim that low-risk substance use causes subsequent, riskier substance use, most users of low-risk substances also do not escalate to higher-risk substances. Social media data holds the potential to shed light on the factors contributing to substance use transitions.

OBJECTIVE

By leveraging social media data, our study aims to gain a better understanding of substance use pathways. By identifying and analyzing the transitions of individuals between different risk levels of substance use, our goal is to find specific linguistic cues in individuals' social media posts that could be indicative of escalating or de-escalating patterns in substance use.

METHODS

We conducted a large-scale analysis using data from Reddit, collected between 2015 and 2019, consisting of over 2.29 million posts and approximately 29.37 million comments by around 1.4 million users from subreddits. This data, derived from substance use subreddits, facilitated the creation of a risk transition dataset reflecting the substance use behaviors of over 1.4 million users. We deployed deep learning and machine learning techniques, including fine-tuned BERT and RoBERTa models, to predict the escalation or de-escalation in risk levels based on initial transition phases documented in posts and comments. Additionally, we conducted an extensive linguistic analysis to analyze the language patterns associated with transitions in substance use, emphasizing the role of n-gram features in predicting future risk trajectories.

RESULTS

Our results showed promise in predicting the escalation or de-escalation in risk levels based on the historical data of Reddit users created on initial transition phases among drug-related subreddits with an accuracy of 78.48% and an F1-score of 79.20%. We highlighted the vital predictive features, such as specific substance names and tools indicative of future risk escalations. Our linguistic analysis showed terms linked with harm reduction strategies were instrumental in signaling de-escalation, whereas descriptors of frequent substance use were characteristic of escalating transitions.

CONCLUSIONS

This study sheds light on the complexities surrounding the gateway hypothesis of substance use through an examination of online behavior on Reddit. While certain findings validate the hypothesis, indicating a progression from lower-risk substances like marijuana to higher-risk ones, a significant number of individuals did not showcase this transition. The research underscores the potential of using machine learning in conjunction with social media analysis for predicting substance use transitions. Our results emphasize the role of linguistic features as predictors and the importance of timely interventions.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3