BACKGROUND
Advent of social media platforms like X (formerly known as Twitter) provide a useful way to unobtrusively monitor and mine user-generated information and use advanced NLP and text-mining algorithms to detect mental illnesses such as depression.
OBJECTIVE
Using twitter data, this study examines how depression markers change over the progression of the disease in individuals. Our goals are (1) To analyze twitter data to identify temporal changes in depression markers 90 days before and after a clinical diagnosis, (2). To use topic modeling to extract and analyze key themes related to depression from the tweets of diagnosed individuals, (3) To evaluate the effectiveness of machine learning classifiers in distinguishing between depressed and non-depressed users based on tweet content, (4) To provide insights into how the progression of depression and its markers can be tracked and understood through temporal analysis of social media data.
METHODS
We identified 229 depressed individuals and gathered 246,637 tweets made by them over 180 days. CorEx topic modeling was used to mine the tweets to extract themes that characterize depression related discourse, followed by conditional logistic regression to assess odds of the themes occurring in tweets in post-diagnosis period, compared to pre-diagnosis period. Three machine learning classifiers (support vector machines, naive bayes and logistic regression) were built and tested to distinguish depressed users from others.
RESULTS
Our analysis yielded seven themes related to depression viz. causes, physical symptoms, mental symptoms, swear words, treatment, coping and support mechanisms, and lifestyle. Odds of tweeting about causes, physical symptoms, mental symptoms, treatment, and coping/support mechanisms in the post-diagnosis period were 2.22 (95% CI 1.29-3.82), 0.32 (95% CI, 0.14-0.71), 0.74 (95% CI 0.62-0.89), 3.1 (95% CI 1.71-5.61), 1.86 (95% CI 1.24-2.81), respectively. Among the machine learning classifiers tested, logistic regression yielded best performance (AUC=0.91) to classify depressed users from others.
CONCLUSIONS
Temporal analysis using twitter data helps in getting a comprehensive view of depression progression in patients. In addition to identifying changing comorbidities and mental symptoms, it can help in tracking patient’s use of coping and support mechanisms, treatments and causes of depression.