Speech Emotion Recognition Using Deep Feedforward Neural Network-Reference-Cited by-同舟云学术

Speech Emotion Recognition Using Deep Feedforward Neural Network

Published:2018-05-01 Issue:2 Volume:10 Page:554
ISSN:2502-4760
Container-title:Indonesian Journal of Electrical Engineering and Computer Science
language:
Short-container-title:IJEECS

Author:

Alghifari Muhammad Fahreza,Gunawan Teddy Surya,Kartiwi Mira

Abstract

Speech emotion recognition (SER) is currently a research hotspot due to its challenging nature but bountiful future prospects. The objective of this research is to utilize Deep Neural Networks (DNNs) to recognize human speech emotion. First, the chosen speech feature Mel-frequency cepstral coefficient (MFCC) were extracted from raw audio data. Second, the speech features extracted were fed into the DNN to train the network. The trained network was then tested onto a set of labelled emotion speech audio and the recognition rate was evaluated. Based on the accuracy rate the MFCC, number of neurons and layers are adjusted for optimization. Moreover, a custom-made database is introduced and validated using the network optimized. The optimum configuration for SER is 13 MFCC, 12 neurons and 2 layers for 3 emotions and 25 MFCC, 21 neurons and 4 layers for 4 emotions, achieving a total recognition rate of 96.3% for 3 emotions and 97.1% for 4 emotions.Speech emotion recognition (SER) is currently a research hotspot due to its challenging nature but bountiful future prospects. The objective of this research is to utilize Deep Neural Networks (DNNs) to recognize human speech emotion. First, the chosen speech feature Mel-frequency cepstral coefficient (MFCC) were extracted from raw audio data. Second, the speech features extracted were fed into the DNN to train the network. The trained network was then tested onto a set of labelled emotion speech audio and the recognition rate was evaluated. Based on the accuracy rate the MFCC, number of neurons and layers are adjusted for optimization. Moreover, a custom-made database is introduced and validated using the network optimized.The optimum configuration for SER is 13 MFCC, 12 neurons and 2 layers for 3 emotions and 25 MFCC, 21 neurons and 4 layers for 4 emotions, achieving a total recognition rate of 96.3% for 3 emotions and 97.1% for 4 emotions.

Publisher

Institute of Advanced Engineering and Science

Subject

Electrical and Electronic Engineering,Control and Optimization,Computer Networks and Communications,Hardware and Architecture,Information Systems,Signal Processing

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multimodal information fusion method in emotion recognition in the background of artificial intelligence;Internet Technology Letters;2024-03-12

2. Speech Emotion Recognition: A Comprehensive Survey;Wireless Personal Communications;2023-03-08

3. Inertial Sensors for Human Motion Analysis: A Comprehensive Review;IEEE Transactions on Instrumentation and Measurement;2023

4. English Speech Recognition Hybrid Algorithm Based on BP Neural Network;Cyber Security Intelligence and Analytics;2023

5. Predictive analysis of the psychological state of charismatic leaders on employees' work attitudes based on artificial intelligence affective computing;Frontiers in Psychology;2022-09-23