Joint Enhancement and Classification Constraints for Noisy Speech Emotion Recognition-Reference-Cited by-同舟云学术

Joint Enhancement and Classification Constraints for Noisy Speech Emotion Recognition

Published:2023-05-25 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

SUN Linhui¹,WANG Shun¹,CHEN Shuaitong¹,ZHAO Min¹,LI Pingan¹

Affiliation:

1. Nanjing University of Posts and Telecommunications

Abstract

Abstract In the natural environment, the received speech signal is often interfered by noise, which reduces the performance of speech emotion recognition (SER) system. To this end, a noisy SER method based on joint constraints, including enhancement constraint and arousal-valence classification constraint (EC-AVCC), is proposed. This method extracts multi-domain statistical feature (MDSF) to input the SER model based on joint EC-AVCC using convolution neural network and long short-term memory-attention (CNN-ALSTM). The model is jointly constrained by speech enhancement (SE) and arousal-valence classification (AVC) to get robust features suitable for SER in noisy environment. Besides, in the auxiliary SE task, a joint loss function simultaneously constrains the error of ideal ratio mask and the error of the corresponding MDSF to obtain more robust features. The proposed method does not need to carry out noise reduction preprocessing. Under the joint constraints, it can obtain robust and discriminative deep emotion features, which can improve the emotion recognition performance in noisy environment. The experimental results on the CASIA and EMO-DB datasets show that compared with the baseline, the proposed method improves the accuracy of SER in white noise and babble noise by 4.7%-9.9%.

Publisher

Research Square Platform LLC

Reference30 articles.

1. A survey of speech emotion recognition in natural environment;Fahad MS;Digital Signal Processing,2021

2. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers;Mehmet BA;Speech Communication,2020

3. Multi-task semi-supervised adversarial autoencoding for speech emotion recognition;Latif S;IEEE Transactions on Affective Computing,2022

4. Speech emotion recognition via multi-level attention network;Liu K;IEEE Signal Processing Letters,2022

5. Speech emotion recognition based on dnn-decision tree svm model;Sun L;Speech Communication,2019