Speech Emotion Recognition Using RA-Gmlp Model on Time–Frequency Domain Features Extracted by TFCM-Reference-Cited by-同舟云学术

Speech Emotion Recognition Using RA-Gmlp Model on Time–Frequency Domain Features Extracted by TFCM

Published:2024-01-31 Issue:3 Volume:13 Page:588
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Sha Mo¹,Yang Wenzhong¹,Wei Fuyuan¹,Lu Zhifeng²,Chen Mingliang³,Ma Chengji¹,Zhang Linlu¹,Shi Houwang¹

Affiliation:

1. School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China

2. College of Information Technology, Xinjiang Teacher’s College (Xinjiang Education Institute), Urumqi 830043, China

3. School of Software, Xinjiang University, Urumqi 830091, China

Abstract

Speech emotion recognition (SER) is a key branch in the field of artificial intelligence, focusing on the analysis and understanding of emotional content in human speech. It involves a multidisciplinary knowledge of acoustics, phonetics, linguistics, pattern recognition, and neurobiology, aiming to establish a connection between human speech and emotional expression. This technology has shown broad application prospects in the medical, educational, and customer service fields. With the evolution of deep learning and neural network technologies, SER research has shifted from relying on manually designed low-level descriptors (LLDs) to utilizing complex neural network models for extracting high-dimensional features. A perennial challenge for researchers has been how to comprehensively capture the rich emotional features. Given that emotional information is present in both time and frequency domains, our study introduces a novel time–frequency domain convolution module (TFCM) based on Mel-frequency cepstral coefficient (MFCC) features to deeply mine the time–frequency information of MFCCs. In the deep feature extraction phase, for the first time, we have introduced hybrid dilated convolution (HDC) into the SER field, significantly expanding the receptive field of neurons, thereby enhancing feature richness and diversity. Furthermore, we innovatively propose the residual attention-gated multilayer perceptron (RA-GMLP) structure, which combines the global feature recognition ability of GMLP with the concentrated weighting function of the multihead attention mechanism, effectively focusing on the key emotional information within the speech sequence. Through extensive experimental validation, we have demonstrated that TFCM, HDC, and RA-GMLP surpass existing advanced technologies in enhancing the accuracy of SER tasks, fully showcasing the powerful advantages of the modules we proposed.

Funder

“Tianshan Talent” Research Project of Xinjiang

National Natural Science Foundation of China

National Key R&D Program of China

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/3/588/pdf

Reference36 articles.

1. The relation between vocal pitch and vocal emotion recognition abilities in people with autism spectrum disorder and typical development;Schelinski;J. Autism Dev. Disord.,2019

2. Emotional speech processing deficits in bipolar disorder: The role of mismatch negativity and P3a;Paris;J. Affect. Disord.,2018

3. A decision support system for service recovery in affective computing: An experimental investigation;Hsieh;Knowl. Inf. Syst.,2020

4. Lampropoulos, A.S., and Tsihrintzis, G.A. (2012, January 18–20). Evaluation of MPEG-7 descriptors for speech emotional recognition. Proceedings of the 2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Piraeus-Athens, Greece.

5. Emotion recognition: Empirical studies towards the combination of audio-lingual and visual-facial modalities through multi-attribute decision making;Virvou;Int. J. Artif. Intell. Tools,2012

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Combined CNN Architecture for Speech Emotion Recognition;Sensors;2024-09-06

2. Speech Emotion Recognition Using Dual-Stream Representation and Cross-Attention Fusion;Electronics;2024-06-04