Speech Emotion Recognition algorithm based on deep learning algorithm fusion of temporal and spatial features-Reference-Cited by-同舟云学术

Speech Emotion Recognition algorithm based on deep learning algorithm fusion of temporal and spatial features

Published:2021-03-01 Issue:1 Volume:1861 Page:012064
ISSN:1742-6588
Container-title:Journal of Physics: Conference Series
language:
Short-container-title:J. Phys.: Conf. Ser.

Author:

An Xu Dong,Ruan Zhou

Abstract

Abstract In recent years, human-computer interaction systems are gradually entering our lives. As one of the key technologies in human-computer interaction systems, Speech Emotion Recognition(SER) technology can accurately identify emotions and help machines better understand users’ intentions to improve the quality of human-computer interaction, which has received a lot of attention from researchers at home and abroad. With the successful application of deep learning in the fields of image recognition and speech recognition, scholars have started to try to use it in SER and have proposed many deep learning-based SER algorithms. In this paper, we conducted an in-depth study of these algorithms and found that they have problems such as too simple feature extraction methods, low utilization of human-designed features, high model complexity, and low accuracy of recognizing specific emotions. For the data processing, we quadrupled the RAVDESS dataset using additive Gaussian white noise (AWGN) for a total of 5760 audio samples. For the network structure, we build two parallel convolutional neural networks (CNN) to extract spatial features and a transformer encoder network to extract temporal features, classifying emotions from one of 8 classes. Taking advantage of CNN’s advantages in spatial feature representation and sequence encoding conversion, I obtained an accuracy of 80.46% on the hold-out test set of the RAVDESS data set.

Publisher

IOP Publishing

Subject

General Physics and Astronomy

Link

https://iopscience.iop.org/article/10.1088/1742-6596/1861/1/012064/pdf

Reference12 articles.

1. Features and classifiers for emotion recognition from speech: a survey from 2000 to 201 1[J];Anagnostopoulos;Artificial Intelligence Review,2015

2. Speech emotion recognition using hidden Markov models[J];Nwe;Speech communication,2003

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A lightweight and privacy preserved federated learning ecosystem for analyzing verbal communication emotions in identical and non-identical databases;Measurement: Sensors;2024-08

2. Disruptive situation detection on public transport through speech emotion recognition;Intelligent Systems with Applications;2024-03

3. Evaluation of English Speech Interaction Quality Based on Deep Learning;2024 International Conference on Integrated Circuits and Communication Systems (ICICACS);2024-02-23

4. Emotion Variation Detection in Discrete English Speech: A Wavelet Transform Use Case in Mental Health Monitoring;Proceedings of the 2024 Australasian Computer Science Week;2024-01-29

5. Machine Learning Approach for Detection of Speech Emotions for RAVDESS Audio Dataset;2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT);2024-01-11