Detection of Emotion of Speech for RAVDESS Audio Using Hybrid Convolution Neural Network

Author:

Puri Tanvi1ORCID,Soni Mukesh2ORCID,Dhiman Gaurav345ORCID,Ibrahim Khalaf Osamah6ORCID,alazzam Malik7ORCID,Raza Khan Ihtiram8ORCID

Affiliation:

1. ICT Ganpat University, Ahmedabad, Gujarat, India

2. Computer Science and Engineering, Jagran Lakecity University, Bhopal, India

3. Department of Computer Science, Government Bikram College of Commerce, Patiala, India

4. University Centre for Research and Development, Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, India

5. Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India

6. Al-Nahrain University, Baghdad, Iraq

7. Lone Star College-Victory Center, Houston, TX, USA

8. Computer Science Department, Jamia Hamdard University, Delhi, India

Abstract

Every human being has emotion for every item related to them. For every customer, their emotion can help the customer representative to understand their requirement. So, speech emotion recognition plays an important role in the interaction between humans. Now, the intelligent system can help to improve the performance for which we design the convolution neural network (CNN) based network that can classify emotions in different categories like positive, negative, or more specific. In this paper, we use the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) audio records. The Log Mel Spectrogram and Mel-Frequency Cepstral Coefficients (MFCCs) were used to feature the raw audio file. These properties were used in the classification of emotions using techniques, such as Long Short-Term Memory (LSTM), CNNs, Hidden Markov models (HMMs), and Deep Neural Networks (DNNs). For this paper, we have divided the emotions into three sections for males and females. In the first section, we divide the emotion into two classes as positive. In the second section, we divide the emotion into three classes such as positive, negative, and neutral. In the third section, we divide the emotions into 8 different classes such as happy, sad, angry, fearful, surprise, disgust expressions, calm, and fearful emotions. For these three sections, we proposed the model which contains the eight consecutive layers of the 2D convolution neural method. The purposed model gives the better-performed categories to other previously given models. Now, we can identify the emotion of the consumer in better ways.

Publisher

Hindawi Limited

Subject

Health Informatics,Biomedical Engineering,Surgery,Biotechnology

Cited by 25 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. An enhanced speech emotion recognition using vision transformer;Scientific Reports;2024-06-07

2. Self-supervised Learning for Speech Emotion Recognition Task Using Audio-visual Features and Distil Hubert Model on BAVED and RAVDESS Databases;Journal of Systems Science and Systems Engineering;2024-05-29

3. Enhancing masked facial expression recognition with multimodal deep learning;Multimedia Tools and Applications;2024-02-13

4. Machine Learning Approach for Detection of Speech Emotions for RAVDESS Audio Dataset;2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT);2024-01-11

5. Detecting and Analyzing the Emotional Levels of a Person Through CBT Using MFCC and Lexicon-Based Approach;Lecture Notes in Networks and Systems;2024

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3