BACKGROUND
Frequent interaction with mental health professionals is required to screen, diagnose, and track mental health disorders. However, high costs and insufficient access can make frequent interaction difficult. The ability to assess a mental health disorder passively and at frequent intervals could be a useful complement to conventional treatment. It may be possible to passively assess clinical symptoms with high frequency by characterizing speech alterations collected through personal smartphones or other wearable devices. The association between speech features and mental health disorders could be leveraged as an objective screening tool.
OBJECTIVE
To evaluate the performance of a model that predicts the presence of Generalized Anxiety Disorder (GAD) from acoustic and linguistic features of impromptu speech.
METHODS
N = 2,000 participants were recruited and participated in a single online session. They completed the Generalized Anxiety Disorder-7 item scale (GAD-7) assessment and provided an impromptu speech sample in response to a modified version of the Trier Social Stress Test (TSST). We use linguistic and acoustic features found to be associated with anxiety disorders in previous studies, along with demographic information, to predict if participants fall above or below the screening threshold for GAD based on the GAD-7 scale threshold of 10. Separate models for each sex were also evaluated. We report the mean area under the receiver operating characteristic (AUROC) from a repeated 5-fold cross-validation to evaluate the performance of the models.
RESULTS
A logistic regression model using only acoustic and linguistic speech features achieved a significantly greater prediction accuracy than a random model (mean AUROC = 0.57, SD = 0.03, P < .001). When separately assessing female samples, a mean AUROC of 0.55 (SD = 0.05, P = .01) was observed. The model constructed on the male samples achieved a mean AUROC of 0.57 (SD = 0.07, P = .002). The mean AUROC increased to 0.62 (SD = 0.03, P < .001) on the all-sample dataset when demographic information (age, sex, and income) was included indicating the importance of demographics when screening for anxiety disorders. The performance also increased for the female dataset to 0.62 (SD = 0.04, P < .001) when using demographic info (age and income). In both cases, age had the highest impact on the prediction performance gain. An increase in performance was not observed when adding demographic information for the model constructed on the male samples.
CONCLUSIONS
Acoustic and linguistic speech features can achieve above-random prediction accuracy for predicting GAD. Importantly, the addition of basic demographic variables further improves model performance, suggesting a role for speech and demographic information to be used as automated, objective screeners of GAD.