Models of Gender Dysphoria Using Social Media Data for Use in Technology-Delivered Interventions: Machine Learning and Natural Language Processing Validation Study

Author:

Cascalheira Cory JORCID,Flinn Ryan EORCID,Zhao YuxuanORCID,Klooster DannieORCID,Laprade DanicaORCID,Hamdi Shah MuhammadORCID,Scheer Jillian RORCID,Gonzalez AlejandraORCID,Lund Emily MORCID,Gomez Ivan NORCID,Saha KoustuvORCID,De Choudhury MunmunORCID

Abstract

Background The optimal treatment for gender dysphoria is medical intervention, but many transgender and nonbinary people face significant treatment barriers when seeking help for gender dysphoria. When untreated, gender dysphoria is associated with depression, anxiety, suicidality, and substance misuse. Technology-delivered interventions for transgender and nonbinary people can be used discretely, safely, and flexibly, thereby reducing treatment barriers and increasing access to psychological interventions to manage distress that accompanies gender dysphoria. Technology-delivered interventions are beginning to incorporate machine learning (ML) and natural language processing (NLP) to automate intervention components and tailor intervention content. A critical step in using ML and NLP in technology-delivered interventions is demonstrating how accurately these methods model clinical constructs. Objective This study aimed to determine the preliminary effectiveness of modeling gender dysphoria with ML and NLP, using transgender and nonbinary people’s social media data. Methods Overall, 6 ML models and 949 NLP-generated independent variables were used to model gender dysphoria from the text data of 1573 Reddit (Reddit Inc) posts created on transgender- and nonbinary-specific web-based forums. After developing a codebook grounded in clinical science, a research team of clinicians and students experienced in working with transgender and nonbinary clients used qualitative content analysis to determine whether gender dysphoria was present in each Reddit post (ie, the dependent variable). NLP (eg, n-grams, Linguistic Inquiry and Word Count, word embedding, sentiment, and transfer learning) was used to transform the linguistic content of each post into predictors for ML algorithms. A k-fold cross-validation was performed. Hyperparameters were tuned with random search. Feature selection was performed to demonstrate the relative importance of each NLP-generated independent variable in predicting gender dysphoria. Misclassified posts were analyzed to improve future modeling of gender dysphoria. Results Results indicated that a supervised ML algorithm (ie, optimized extreme gradient boosting [XGBoost]) modeled gender dysphoria with a high degree of accuracy (0.84), precision (0.83), and speed (1.23 seconds). Of the NLP-generated independent variables, Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) clinical keywords (eg, dysphoria and disorder) were most predictive of gender dysphoria. Misclassifications of gender dysphoria were common in posts that expressed uncertainty, featured a stressful experience unrelated to gender dysphoria, were incorrectly coded, expressed insufficient linguistic markers of gender dysphoria, described past experiences of gender dysphoria, showed evidence of identity exploration, expressed aspects of human sexuality unrelated to gender dysphoria, described socially based gender dysphoria, expressed strong affective or cognitive reactions unrelated to gender dysphoria, or discussed body image. Conclusions Findings suggest that ML- and NLP-based models of gender dysphoria have significant potential to be integrated into technology-delivered interventions. The results contribute to the growing evidence on the importance of incorporating ML and NLP designs in clinical science, especially when studying marginalized populations.

Publisher

JMIR Publications Inc.

Subject

Health Informatics,Medicine (miscellaneous)

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3