Edge emotion recognition: applying fast Fourier transform on speech Mel spectrograms to classify emotion on a Raspberry Pi for near real-time analytics-Reference-Cited by-同舟云学术

Edge emotion recognition: applying fast Fourier transform on speech Mel spectrograms to classify emotion on a Raspberry Pi for near real-time analytics

Published:2022-10-31 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

de Andrade Dominik Esteves¹,Buchkremer Rüdiger¹

Affiliation:

1. FOM University of Applied Sciences

Abstract

Abstract Many people and machines are inherently unable to interpret socio-affective cues such as tone of voice. Thoughtful adoption of intelligent technologies may improve the conversation. Since direct communication often occurs via edge devices, where an additional network connection is often not guaranteed, we now describe a real-time processing method that captures and evaluates emotions in a speech via a terminal device such as the Raspberry Pi computer. In this article, we also present the current state of research on speech emotional recognition. We examine audio files from five important emotional speech databases and visualize them in situ with dB-scaled Mel spectrograms using TensorFlow and Matplotlib. Audio files are transformed using the fast Fourier transform method to generate spectrograms. For classification, a support vector machine kernel and a CNN with transfer learning are selected. The accuracy of this classification is 70% and 77%, respectively, a good value related to the execution of the algorithms on an edge device instead of on a server. On a Raspberry Pi, it took less than one second to evaluate pure emotion in speech using machine learning and the corresponding visualization, suggesting the speaker's emotional state.

Publisher

Research Square Platform LLC

Reference65 articles.

1. Abadi M, Agarwal A, Barham P et al (2019) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv 2016. arXiv preprint arXiv:1603.04467

2. Mobile Edge Computing: A Survey;Abbas N;IEEE Internet Things J,2018

3. Evolution of Artificial Intelligence Programming Languages - a Systematic Literature Review;Adetiba E;J Comput Sci,2021

4. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers;Akçay MB;Speech Commun,2020

5. Amiriparian S, Gerczuk M, Ottl S et al (2018) Bag-of-Deep-Features: Noise-Robust Deep Feature Representations for Audio Analysis. In: 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 1–7