Semantic speech analysis using machine learning and deep learning techniques: a comprehensive review-Reference-Cited by-同舟云学术

Semantic speech analysis using machine learning and deep learning techniques: a comprehensive review

Published:2023-12-19 Issue: Volume: Page:
ISSN:1380-7501
Container-title:Multimedia Tools and Applications
language:en
Short-container-title:Multimed Tools Appl

Author:

Tyagi Suryakant,Szénási Sándor

Abstract

AbstractHuman cognitive functions such as perception, attention, learning, memory, reasoning, and problem-solving are all significantly influenced by emotion. Emotion has a particularly potent impact on attention, modifying its selectivity in particular and influencing behavior and action motivation. Artificial Emotional Intelligence (AEI) technologies enable computers to understand a user's emotional state and respond appropriately. These systems enable a realistic dialogue between people and machines. The current generation of adaptive user interference technologies is built on techniques from data analytics and machine learning (ML), namely deep learning (DL) artificial neural networks (ANN) from multimodal data, such as videos of facial expressions, stance, and gesture, voice, and bio-physiological data (such as eye movement, ECG, respiration, EEG, FMRT, EMG, eye tracking). In this study, we reviewed existing literature based on ML and data analytics techniques being used to detect emotions in speech. The efficacy of data analytics and ML techniques in this unique area of multimodal data processing and extracting emotions from speech. This study analyzes how emotional chatbots, facial expressions, images, and social media texts can be effective in detecting emotions. PRISMA methodology is used to review the existing survey. Support Vector Machines (SVM), Naïve Bayes (NB), Random Forests (RF), Recurrent Neural Networks (RNN), Logistic Regression (LR), etc., are commonly used ML techniques for emotion extraction purposes. This study provides a new taxonomy about the application of ML in SER. The result shows that Long-Short Term Memory (LSTM) and Convolutional Neural Networks (CNN) are found to be the most useful methodology for this purpose.

Funder

Óbuda University

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Hardware and Architecture,Media Technology,Software

Link

https://link.springer.com/content/pdf/10.1007/s11042-023-17769-6.pdf

Reference159 articles.

1. Garcia-Garcia JM, Penichet VM, Lozano MD (2017) Emotion detection: a technology review. In: Proceedings of the XVIII international conference on human computer interaction, pp 1–8

2. Todd B, Tucker C, Hopkinson K, Bilén SG (2014) Increasing the veracity of event detection on social media networks through user trust modeling. In: IEEE international conference on big data, pp 636–643

3. Sun S, Luo C, Chen J (2017) A review of natural language processing techniques for opinion mining systems. Inf Fusion 36:10–25

4. Rajput K, Kapoor R, Mathur P, Kumaraguru P, Shah RR (2020) Transfer learning for detecting hateful sentiments in code switched language. Deep learning-based approaches for sentiment analysis, pp 159–192

5. Zhu X, Lou Y, Deng H, Ji D (2022) Leveraging bilingual-view parallel translation for code-switched emotion detection with adversarial dual-channel encoder. Knowl-Based Syst 235:107436