Depression recognition using voice-based pre-training model-Reference-Cited by-同舟云学术

Depression recognition using voice-based pre-training model

Published:2024-06-03 Issue:1 Volume:14 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Huang Xiangsheng,Wang Fang,Gao Yuan,Liao Yilong,Zhang Wenjing,Zhang Li,Xu Zhenrong

Abstract

AbstractThe early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.

Funder

Fundamental Research Funds for the Central Universities of South-Central Minzu University

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41598-024-63556-0.pdf

Reference49 articles.

1. Wang, Z. et al. Recognition of audio depression based on convolutional neural network and generative antagonism network model. IEEE Access 8, 101181–101191 (2020).

2. Janardhan, N. & Kumaresh, N. Improving depression prediction accuracy using fisher score-based feature selection and dynamic ensemble selection approach based on acoustic features of speech. Traitement du Signal 39(1), 87 (2022).

3. Solieman, H., Pustozerov, E. A. The detection of depression using multimodal models based on text and voice quality features. In 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus) (IEEE, 2021).

4. Li, X. et al. Depression recognition using machine learning methods with different feature generation strategies. Artif. Intell. Med. 99, 101696 (2019).

5. Wollenhaupt-Aguiar, B. et al. Differential biomarker signatures in unipolar and bipolar depression: A machine learning approach. Aust. N. Z. J. Psychiatry 54(4), 393–401 (2020).