Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation-Reference-Cited by-同舟云学术

Enabling Early Health Care Intervention by Detecting Depression in Users of Web-Based Forums using Language Models: Longitudinal Analysis and Evaluation

Published:2023-03-24 Issue: Volume:2 Page:e41205
ISSN:2817-1705
Container-title:JMIR AI
language:en
Short-container-title:JMIR AI

Author:

Owen David^ORCID,Antypas Dimosthenis^ORCID,Hassoulas Athanasios^ORCID,Pardiñas Antonio F^ORCID,Espinosa-Anke Luis^ORCID,Collados Jose Camacho^ORCID

Abstract

Background Major depressive disorder is a common mental disorder affecting 5% of adults worldwide. Early contact with health care services is critical for achieving accurate diagnosis and improving patient outcomes. Key symptoms of major depressive disorder (depression hereafter) such as cognitive distortions are observed in verbal communication, which can also manifest in the structure of written language. Thus, the automatic analysis of text outputs may provide opportunities for early intervention in settings where written communication is rich and regular, such as social media and web-based forums. Objective The objective of this study was 2-fold. We sought to gauge the effectiveness of different machine learning approaches to identify users of the mass web-based forum Reddit, who eventually disclose a diagnosis of depression. We then aimed to determine whether the time between a forum post and a depression diagnosis date was a relevant factor in performing this detection. Methods A total of 2 Reddit data sets containing posts belonging to users with and without a history of depression diagnosis were obtained. The intersection of these data sets provided users with an estimated date of depression diagnosis. This derived data set was used as an input for several machine learning classifiers, including transformer-based language models (LMs). Results Bidirectional Encoder Representations from Transformers (BERT) and MentalBERT transformer-based LMs proved the most effective in distinguishing forum users with a known depression diagnosis from those without. They each obtained a mean F1-score of 0.64 across the experimental setups used for binary classification. The results also suggested that the final 12 to 16 weeks (about 3-4 months) of posts before a depressed user’s estimated diagnosis date are the most indicative of their illness, with data before that period not helping the models detect more accurately. Furthermore, in the 4- to 8-week period before the user’s estimated diagnosis date, their posts exhibited more negative sentiment than any other 4-week period in their post history. Conclusions Transformer-based LMs may be used on data from web-based social media forums to identify users at risk for psychiatric conditions such as depression. Language features picked up by these classifiers might predate depression onset by weeks to months, enabling proactive mental health care interventions to support those at risk for this condition.

Publisher

JMIR Publications Inc.

Reference78 articles.

1. Global Health Data Exchange (GHDx)Institute of Health Metrics and Evaluation2021-05-01http://ghdx.healthdata.org/gbd- results-tool?params=gbd-api-2019-permalink/d780dffbe8a381b25e1416884959e88b

2. Twelve-month and lifetime prevalence and lifetime morbid risk of anxiety and mood disorders in the United States

3. The DSM-5: Classification and criteria changes

4. A randomised controlled trial of the effectiveness of a program for early detection and treatment of depression in primary care

5. Reluctance to Seek Help and the Perception of Anxiety and Depression in the United Kingdom

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Artificial Intelligence for Analyzing Psychiatric Disorders in Social Media: A Quarter-Century Narrative Review of Progress and Challenges (Preprint);2024-04-23

2. Leveraging LLM-Generated Data for Detecting Depression Symptoms on Social Media;Lecture Notes in Computer Science;2024

3. Enhanced Labeling Technique for Reddit Text and Fine-Tuned Longformer Models for Classifying Depression Severity in English and Luganda;2023 14th International Conference on Information and Communication Technology Convergence (ICTC);2023-10-11