Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest

Author:

Rezapour Mashhadi Mohammad MahdiORCID,Osei-Bonsu Kofi

Abstract

Speech is a direct and rich way of transmitting information and emotions from one point to another. In this study, we aimed to classify different emotions in speech using various audio features and machine learning models. We extracted various types of audio features such as Mel-frequency cepstral coefficients, chromogram, Mel-scale spectrogram, spectral contrast feature, Tonnetz representation and zero-crossing rate. We used a limited dataset of speech emotion recognition (SER) and augmented it with additional audios. In addition, In contrast to many previous studies, we combined all audio files together before conducting our analysis. We compared the performance of two models: one-dimensional convolutional neural network (conv1D) and random forest (RF), with RF-based feature selection. Our results showed that RF with feature selection achieved higher average accuracy (69%) than conv1D and had the highest precision for fear (72%) and the highest recall for calm (84%). Our study demonstrates the effectiveness of RF with feature selection for speech emotion classification using a limited dataset. We found for both algorithms, anger is misclassified mostly with happy, disgust with sad and neutral, and fear with sad. This could be due to the similarity of some acoustic features between these emotions, such as pitch, intensity, and tempo.

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference25 articles.

1. Hershey S, Chaudhuri S, Ellis DP, Gemmeke JF, Jansen A, Moore RC, et al., editors. CNN architectures for large-scale audio classification. 2017 ieee international conference on acoustics, speech and signal processing (icassp); 2017: IEEE.

2. Content analysis for audio classification and segmentation;L Lu;IEEE Transactions on speech and audio processing,2002

3. Security monitoring using microphone arrays and audio classification;AR Abu-El-Quran;IEEE Transactions on Instrumentation and Measurement,2006

4. Berenzweig AL, Ellis DP, Lawrence S, editors. Using voice segments to improve artist classification of music. Audio Engineering Society Conference: 22nd International Conference: Virtual, Synthetic, and Entertainment Audio; 2002: Audio Engineering Society.

5. Audio based depression detection using Convolutional Autoencoder;S Sardari;Expert Systems with Applications,2022

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3