Author:
Qi Yingmei,Huang Heming,Zhang Huiyun
Abstract
Speech emotion recognition is a crucial work direction in speech recognition. To increase the performance of speech emotion detection, researchers have worked relentlessly to improve data augmentation, feature extraction, and pattern formation. To address the concerns of limited speech data resources and model training overfitting, A-CapsNet, a neural network model based on data augmentation methodologies, is proposed in this research. In order to solve the issue of data scarcity and achieve the goal of data augmentation, the noise from the Noisex-92 database is first combined with four different data division methods (emotion-independent random-division, emotion-dependent random-division, emotion-independent cross-validation and emotion-dependent cross-validation methods, abbreviated as EIRD, EDRD, EICV and EDCV, respectively). The database EMODB is then used to analyze and compare the performance of the model proposed in this paper under different signal-to-noise ratios, and the results show that the proposed model and data augmentation are effective.
Funder
National Natural Science Foundation of China
Natural Science Foundation of Qinghai Province
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference47 articles.
1. Jin, B., and Liu, G. (2017, January 19–21). Speech Emotion Recognition Based on Hyper-Prosodic Features. Proceedings of the 2017 International Conference on Computer Technology, Electronics and Communication (ICCTEC), Dalian, China.
2. Multi-feature speech emotion recognition based on random forest classification and optimization;Li;Microelectron. Comput.,2019
3. Spectrogram improves speech emotion recognition based on completely local binary patterns;Xu;J. Electron. Meas. Instrum.,2018
4. Speech emotion recognition combining shallow learning and deep learning models;Zhao;Comput. Appl. Softw.,2020
5. Speech emotion recognition with embedded attention mechanism combined with hierarchical context;Cheng;J. Harbin Inst. Technol.,2019
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Machine Learning Approach for Detection of Speech Emotions for RAVDESS Audio Dataset;2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT);2024-01-11
2. Survey On Medical Image Classification Using CAPSGNN;Recent Research Reviews Journal;2023-06