Speech Emotion Recognition Using Audio Matching-Reference-Cited by-同舟云学术

Speech Emotion Recognition Using Audio Matching

Published:2022-11-29 Issue:23 Volume:11 Page:3943
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Chaturvedi Iti^ORCID,Noel Tim,Satapathy Ranjan^ORCID

Abstract

It has become popular for people to share their opinions about products on TikTok and YouTube. Automatic sentiment extraction on a particular product can assist users in making buying decisions. For videos in languages such as Spanish, the tone of voice can be used to determine sentiments, since the translation is often unknown. In this paper, we propose a novel algorithm to classify sentiments in speech in the presence of environmental noise. Traditional models rely on pretrained audio feature extractors for humans that do not generalize well across different accents. In this paper, we leverage the vector space of emotional concepts where words with similar meanings often have the same prefix. For example, words starting with ‘con’ or ‘ab’ signify absence and hence negative sentiments. Augmentations are a popular way to amplify the training data during audio classification. However, some augmentations may result in a loss of accuracy. Hence, we propose a new metric based on eigenvalues to select the best augmentations. We evaluate the proposed approach on emotions in YouTube videos and outperform baselines in the range of 10–20%. Each neuron learns words with similar pronunciations and emotions. We also use the model to determine the presence of birds from audio recordings in the city.

Funder

College of Science and Engineering at James Cook University, Australia

IHPC Singapore

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/11/23/3943/pdf

Reference39 articles.

1. Statistical approaches to concept-level sentiment analysis;Cambria;IEEE Intell. Syst.,2013

2. Latif, S., Cuayáhuitl, H., Pervez, F., Shamshad, F., Ali, H.S., and Cambria, E. (2022). A Survey on Deep Reinforcement Learning for Audio-Based Applications. Artif. Intell. Rev., 1–48.

3. Cognitive insights into sentic spaces using principal paths;Ragusa;Cogn. Comput.,2019

4. Polarity and Subjectivity Detection with Multitask Learning and BERT Embedding;Satapathy;Future Internet.,2022

5. Toward hardware-aware deep-learning-based dialogue systems;Pandelea;Neural Comput. Appl.,2021

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Barrier Function to Skin Elasticity in Talking Head;Cognitive Computation;2024-08-24

2. Predicting word vectors for microtext;Expert Systems;2024-03-28

3. Informative Speech Features based on Emotion Classes and Gender in Explainable Speech Emotion Recognition;2023 11th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW);2023-09-10

4. Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network;Electronics;2023-02-07