Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders

Author:

Pulatov Ilkhomjon1,Oteniyazov Rashid2,Makhmudov Fazliddin1ORCID,Cho Young-Im1ORCID

Affiliation:

1. Department of Computer Engineering, Gachon University, Seongnam 13120, Republic of Korea

2. Department of Telecommunication Engineering, Nukus Branch of Tashkent University of Information Technologies Named after Muhammad Al-Khwarizmi, Nukus 230100, Uzbekistan

Abstract

Understanding and identifying emotional cues in human speech is a crucial aspect of human–computer communication. The application of computer technology in dissecting and deciphering emotions, along with the extraction of relevant emotional characteristics from speech, forms a significant part of this process. The objective of this study was to architect an innovative framework for speech emotion recognition predicated on spectrograms and semantic feature transcribers, aiming to bolster performance precision by acknowledging the conspicuous inadequacies in extant methodologies and rectifying them. To procure invaluable attributes for speech detection, this investigation leveraged two divergent strategies. Primarily, a wholly convolutional neural network model was engaged to transcribe speech spectrograms. Subsequently, a cutting-edge Mel-frequency cepstral coefficient feature abstraction approach was adopted and integrated with Speech2Vec for semantic feature encoding. These dual forms of attributes underwent individual processing before they were channeled into a long short-term memory network and a comprehensive connected layer for supplementary representation. By doing so, we aimed to bolster the sophistication and efficacy of our speech emotion detection model, thereby enhancing its potential to accurately recognize and interpret emotion from human speech. The proposed mechanism underwent a rigorous evaluation process employing two distinct databases: RAVDESS and EMO-DB. The outcome displayed a predominant performance when juxtaposed with established models, registering an impressive accuracy of 94.8% on the RAVDESS dataset and a commendable 94.0% on the EMO-DB dataset. This superior performance underscores the efficacy of our innovative system in the realm of speech emotion recognition, as it outperforms current frameworks in accuracy metrics.

Funder

Korea Agency for Technology and Standards in 2022

Ministry of Oceans and Fisheries

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Reference48 articles.

1. Speech Emotion Recognition Based on SVM with Local Temporal-Spectral Features;He;IEEE Access,2021

2. Comparative study of SVM and KNN classifiers on speech emotion recognition based on prosody features;Dhouha;J. Ambient Intell. Humaniz. Comput.,2020

3. Multi-modal Speech Emotion Recognition using SVM Classifier with Semi-Supervised Learning;Shalini;J. Ambient Intell. Humaniz. Comput.,2021

4. Schuller, B., Rigoll, G., and Lang, M. (2005, January 4–8). Hidden Markov model-based speech emotion recognition. Proceedings of the 9th European Conference on Speech Communication and Technology, Lisbon, Portugal.

5. Speech Emotion Recognition Based on HMM and Spiking Neural Network;Liu;IEEE Trans. Neural Netw. Learn. Syst.,2020

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Optimizing Economic Dispatch for Microgrid Clusters Using Improved Grey Wolf Optimization;Electronics;2024-08-08

2. BSER: A Learning Framework for Bangla Speech Emotion Recognition;2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT);2024-05-02

3. EmotionNet: Pioneering Deep Learning Fusion for Real-Time Speech Emotion Recognition with Convolutional Neural Networks;2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT);2024-05-02

4. A Comprehensive Exploration of Stack Ensembling Techniques for Amazon Product Review Sentiment Analysis;2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO);2024-03-14

5. Multimodal Emotion Recognition Using Bi-LG-GCN for MELD Dataset;Balkan Journal of Electrical and Computer Engineering;2024-03-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3