Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning-Reference-Cited by-同舟云学术

Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning

Published:2022-03-19 Issue:6 Volume:22 Page:2378
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Aggarwal Apeksha^ORCID,Srivastava Akshat^ORCID,Agarwal Ajay,Chahal Nidhi,Singh Dilbag^ORCID,Alnuaim Abeer Ali^ORCID,Alhadlaq Aseel^ORCID,Lee Heung-No

Abstract

Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN.

Funder

Ministry of Science and ICT Korea

National Research Foundation of Korea

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/22/6/2378/pdf

Reference41 articles.

1. Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding

2. Speech Emotion Recognition Using Deep Learning Techniques: A Review

3. Speech Emotion Recognition using Neural Network and MLP Classifier;Joy;IJESC,2020

4. Voice emotion recognition using CNN and decision tree;Damodar;Int. J. Innov. Technol. Exp. Eng.,2019

5. Vocal-based emotion recognition using random forests and decision tree

Cited by 63 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Fusion of PCA and ICA in Statistical Subset Analysis for Speech Emotion Recognition;Sensors;2024-09-02

2. Cross-Lingual Transfert Learning for Speech Emotion Recognition;2024 IEEE 7th International Conference on Advanced Technologies, Signal and Image Processing (ATSIP);2024-07-11

3. Speech emotion recognition using the novel SwinEmoNet (Shifted Window Transformer Emotion Network);International Journal of Speech Technology;2024-07-10

4. Navigating the Multimodal Landscape: A Review on Integration of Text and Image Data in Machine Learning Architectures;Machine Learning and Knowledge Extraction;2024-07-09

5. Newman-Watts-Strogatz topology in deep echo state networks for speech emotion recognition;Engineering Applications of Artificial Intelligence;2024-07