BACKGROUND
The COVID-19 pandemic has severely affected people’s daily lives and caused tremendous economic loss worldwide. Anecdotal evidence suggests that the pandemic has increased the depression level among the population. However, systematic studies of depression detection and monitoring during the depression are lacking.
OBJECTIVE
This study aims (1) to develop a method to accurately identify people with depression by analyzing their tweets and (2) to monitor the population-wise depression level on Twitter.
METHODS
To study this subject, we design an effective regular expression-based search method and create by far the largest English Twitter depression dataset containing 2,575 distinct identified depression users (N=2,575) with their past tweets. To examine the effect of depression on people’s Twitter language, we train three transformer-based depression classification models on the dataset, evaluate their performance with progressively increased training sizes, and compare the model’s “tweet chunk”-level and user-level performances. Furthermore, inspired by psychological studies, we create a fusion classifier that combines deep learning model scores with psychological text features and users’ demographic information and investigate these features’ relations to depression signals. Finally, we demonstrate our model’s capability of monitoring both group-level and population-level depression trends by presenting two of its applications during the COVID-19 pandemic.
RESULTS
Our fusion model demonstrates an accuracy of 78.9% on a test set containing 446 people (N=446), half of which are identified as suffering from depression. Conscientiousness, neuroticism, appearance of first-person pronouns, talking about biological processes such as eat and sleep, talking about power, and exhibiting sadness are shown to be important features in depression classification. Further, when used for monitoring the depression trend, our model shows that depressive users, in general, respond to the pandemic later than the control group based on their tweets. It is also shown that three states of the United States - New York (NY), California (CA), and Florida (FL) - share a similar depression trend as the whole US population. When compared to NY and CA, people in FL demonstrate a significantly lower level of depression.
CONCLUSIONS
This study proposes an efficient method that can be used to analyze the depression level of different groups of people on Twitter. We hope this study can raise awareness among researchers and the general public of COVID-19’s impact on people’s mental health. The non-invasive monitoring system can also be rapidly adapted to other big events besides COVID-19 and might be useful during future outbreaks.