Analyzing Suicide Risk From Linguistic Features in Social Media: Evaluation Study

Author:

Lao CeciliaORCID,Lane JoORCID,Suominen HannaORCID

Abstract

BackgroundEffective suicide risk assessments and interventions are vital for suicide prevention. Although assessing such risks is best done by health care professionals, people experiencing suicidal ideation may not seek help. Hence, machine learning (ML) and computational linguistics can provide analytical tools for understanding and analyzing risks. This, therefore, facilitates suicide intervention and prevention.ObjectiveThis study aims to explore, using statistical analyses and ML, whether computerized language analysis could be applied to assess and better understand a person’s suicide risk on social media.MethodsWe used the University of Maryland Suicidality Dataset comprising text posts written by users (N=866) of mental health–related forums on Reddit. Each user was classified with a suicide risk rating (no, low, moderate, or severe) by either medical experts or crowdsourced annotators, denoting their estimated likelihood of dying by suicide. In language analysis, the Linguistic Inquiry and Word Count lexicon assessed sentiment, thinking styles, and part of speech, whereas readability was explored using the TextStat library. The Mann-Whitney U test identified differences between at-risk (low, moderate, and severe risk) and no-risk users. Meanwhile, the Kruskal-Wallis test and Spearman correlation coefficient were used for granular analysis between risk levels and to identify redundancy, respectively. In the ML experiments, gradient boost, random forest, and support vector machine models were trained using 10-fold cross validation. The area under the receiver operator curve and F1-score were the primary measures. Finally, permutation importance uncovered the features that contributed the most to each model’s decision-making.ResultsStatistically significant differences (P<.05) were identified between the at-risk (671/866, 77.5%) and no-risk groups (195/866, 22.5%). This was true for both the crowd- and expert-annotated samples. Overall, at-risk users had higher median values for most variables (authenticity, first-person pronouns, and negation), with a notable exception of clout, which indicated that at-risk users were less likely to engage in social posturing. A high positive correlation (ρ>0.84) was present between the part of speech variables, which implied redundancy and demonstrated the utility of aggregate features. All ML models performed similarly in their area under the curve (0.66-0.68); however, the random forest and gradient boost models were noticeably better in their F1-score (0.65 and 0.62) than the support vector machine (0.52). The features that contributed the most to the ML models were authenticity, clout, and negative emotions.ConclusionsIn summary, our statistical analyses found linguistic features associated with suicide risk, such as social posturing (eg, authenticity and clout), first-person singular pronouns, and negation. This increased our understanding of the behavioral and thought patterns of social media users and provided insights into the mechanisms behind ML models. We also demonstrated the applicative potential of ML in assisting health care professionals to assess and manage individuals experiencing suicide risk.

Publisher

JMIR Publications Inc.

Subject

Health Informatics,Medicine (miscellaneous)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3