A Generalizable Speech Emotion Recognition Model Reveals Depression and Remission-Reference-Cited by-同舟云学术

A Generalizable Speech Emotion Recognition Model Reveals Depression and Remission

Published:2021-09-03 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Hansen Lasse^ORCID,Zhang Yan-Ping,Wolf Detlef,Sechidis Konstantinos,Ladegaard Nicolai,Fusaroli Riccardo^ORCID

Abstract

AbstractObjectiveAffective disorders are associated with atypical voice patterns; however, automated voice analyses suffer from small sample sizes and untested generalizability on external data. We investigated a generalizable approach to aid clinical evaluation of depression and remission from voice using transfer learning: we train machine learning models on easily accessible non-clinical datasets and test them on novel clinical data in a different language.MethodsA Mixture-of-Experts machine learning model was trained to infer happy/sad emotional state using three publicly available emotional speech corpora in German and US English. We examined the model’s predictive ability to classify the presence of depression on Danish speaking healthy controls (N = 42), patients with first-episode major depressive disorder (MDD) (N = 40), and the subset of the same patients who entered remission (N = 25) based on recorded clinical interviews. The model was evaluated on raw, de-noised, and speaker-diarized data.ResultsThe model showed separation between healthy controls and depressed patients at the first visit, obtaining an AUC of 0.71. Further, speech from patients in remission was indistinguishable from that of the control group. Model predictions were stable throughout the interview, suggesting that 20-30 seconds of speech might be enough to accurately screen a patient. Background noise (but not speaker diarization) heavily impacted predictions.ConclusionA generalizable speech emotion recognition model can effectively reveal changes in speaker depressive states before and after remission in patients with MDD. Data collection settings and data cleaning are crucial when considering automated voice analysis for clinical purposes.Significant outcomes

- Using a speech emotion recognition model trained on other languages, we predicted the presence of MDD with an AUC of 0.71.

- The speech emotion recognition model could accurately detect changes in voice after patients achieved remission from MDD.

- Preprocessing steps, particularly background noise removal, greatly influenced classification performance.

Limitations

- No data from non-remitters, meaning that changes to voice for that group could not be assessed.

- It is unclear how well the model would generalize beyond Germanic languages.

Data availability statementDue to the nature of the data (autobiographical interviews in a clinical population), the recordings of the participants cannot be shared publicly. The aggregated model predictions and code used to run the analyses is available at https://github.com/HLasse/SERDepressionDetection.

Publisher

Cold Spring Harbor Laboratory

Reference79 articles.

1. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017

2. Major Depressive Disorder

3. SURVEY RESEARCH

4. A Comparison of Self-report and Clinical Diagnostic Interviews for Depression

5. Response Styles in Marketing Research: A Cross-National Investigation

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identifying medications underlying communication atypicalities in psychotic and affective disorders: A pharmacovigilance study within the FDA Adverse Event Reporting System;2022-09-06

2. Vocal markers of autism: Assessing the generalizability of machine learning models;Autism Research;2022-04-06

3. Vocal markers of autism: assessing the generalizability of machine learning models;2021-11-24