Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on Reddit during COVID-19: an observational study-Reference-Cited by-同舟云学术

Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on Reddit during COVID-19: an observational study

Published:2020-07-13 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Low Daniel Mark^ORCID,Rumker Laurie,Talkar Tanya,Torous John,Cecchi Guillermo,Ghosh Satrajit S

Abstract

Background: The COVID-19 pandemic is exerting a devastating impact on mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit. Objective: We leverage natural language processing (NLP) with the goal of characterizing changes in fifteen of the world's largest mental health support groups (e.g., r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with eleven non-mental health groups (e.g., r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic. Methods: We create and release the Reddit Mental Health Dataset including posts from 826,961 unique users from 2018 to 2020. Using regression, we analyze trends from 90 text-derived features such as sentiment analysis, personal pronouns, and a “guns” semantic category. Using supervised machine learning, we classify posts into their respective support group and interpret important features to understand how different problems manifest in language. We apply unsupervised methods such as topic modeling and unsupervised clustering to uncover concerns throughout Reddit before and during the pandemic. Results: We find that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately two months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories “economic stress”, “isolation”, and “home” while others such as “motion” significantly decreased. We find that support groups related to attention deficit hyperactivity disorder (ADHD), eating disorders (ED), and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses. For instance, we provide evidence that the concerns of a diverse set of individuals are converging in this unique moment of history; we discover that the more users posted about COVID-19, the more linguistically similar (less distant) the mental health support groups became to r/HealthAnxiety (ρ = -0.96, P<.001). Using unsupervised clustering, we find the Suicidality and Loneliness clusters more than doubled in amount of posts during the pandemic. Specifically, the support groups for borderline personality disorder and post-traumatic stress disorder became significantly associated with the Suicidality cluster. Furthermore, clusters surrounding Self-Harm and Entertainment emerged. Conclusions: By using a broad set of NLP techniques and analyzing a baseline of pre-pandemic posts, we uncover patterns of how specific mental health problems manifest in language, identify at-risk users, and reveal the distribution of concerns across Reddit which could help provide better resources to its millions of users. We then demonstrate that textual analysis is sensitive to uncover mental health complaints as they arise in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests from the present or the past.

Publisher

Center for Open Science

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mental Health Detection using Machine Learning;International Journal of Innovative Science and Research Technology (IJISRT);2024-04-25

2. The first year of the Covid-19 pandemic through the lens of r/Coronavirus subreddit: an exploratory study;Health and Technology;2023-02-21

3. Support or Suppress? Research on the Mechanism of Employee’s GNS on Innovation Performance: From the Perspective of Status Competition;Frontiers in Psychology;2022-06-22

4. Surveillance of COVID-19 pandemic using social media;Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics;2021-08

5. Studies of Depression and Anxiety Using Reddit as a Data Source: Scoping Review (Preprint);2021-04-13