A Generalizable Speech Emotion Recognition Model Reveals Depression and Remission

Author:

Hansen LasseORCID,Zhang Yan-Ping,Wolf Detlef,Sechidis Konstantinos,Ladegaard Nicolai,Fusaroli RiccardoORCID

Abstract

AbstractObjectiveAffective disorders are associated with atypical voice patterns; however, automated voice analyses suffer from small sample sizes and untested generalizability on external data. We investigated a generalizable approach to aid clinical evaluation of depression and remission from voice using transfer learning: we train machine learning models on easily accessible non-clinical datasets and test them on novel clinical data in a different language.MethodsA Mixture-of-Experts machine learning model was trained to infer happy/sad emotional state using three publicly available emotional speech corpora in German and US English. We examined the model’s predictive ability to classify the presence of depression on Danish speaking healthy controls (N = 42), patients with first-episode major depressive disorder (MDD) (N = 40), and the subset of the same patients who entered remission (N = 25) based on recorded clinical interviews. The model was evaluated on raw, de-noised, and speaker-diarized data.ResultsThe model showed separation between healthy controls and depressed patients at the first visit, obtaining an AUC of 0.71. Further, speech from patients in remission was indistinguishable from that of the control group. Model predictions were stable throughout the interview, suggesting that 20-30 seconds of speech might be enough to accurately screen a patient. Background noise (but not speaker diarization) heavily impacted predictions.ConclusionA generalizable speech emotion recognition model can effectively reveal changes in speaker depressive states before and after remission in patients with MDD. Data collection settings and data cleaning are crucial when considering automated voice analysis for clinical purposes.Significant outcomes- Using a speech emotion recognition model trained on other languages, we predicted the presence of MDD with an AUC of 0.71.- The speech emotion recognition model could accurately detect changes in voice after patients achieved remission from MDD.- Preprocessing steps, particularly background noise removal, greatly influenced classification performance.Limitations- No data from non-remitters, meaning that changes to voice for that group could not be assessed.- It is unclear how well the model would generalize beyond Germanic languages.Data availability statementDue to the nature of the data (autobiographical interviews in a clinical population), the recordings of the participants cannot be shared publicly. The aggregated model predictions and code used to run the analyses is available at https://github.com/HLasse/SERDepressionDetection.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3