Emotion Recognition from Speech Using the Bag-of-Visual Words on Audio Segment Spectrograms-Reference-Cited by-同舟云学术

Emotion Recognition from Speech Using the Bag-of-Visual Words on Audio Segment Spectrograms

Published:2019-02-04 Issue:1 Volume:7 Page:20
ISSN:2227-7080
Container-title:Technologies
language:en
Short-container-title:Technologies

Author:

Spyrou Evaggelos,Nikopoulou Rozalia,Vernikos Ioannis,Mylonas Phivos

Abstract

It is noteworthy nowadays that monitoring and understanding a human’s emotional state plays a key role in the current and forthcoming computational technologies. On the other hand, this monitoring and analysis should be as unobtrusive as possible, since in our era the digital world has been smoothly adopted in everyday life activities. In this framework and within the domain of assessing humans’ affective state during their educational training, the most popular way to go is to use sensory equipment that would allow their observing without involving any kind of direct contact. Thus, in this work, we focus on human emotion recognition from audio stimuli (i.e., human speech) using a novel approach based on a computer vision inspired methodology, namely the bag-of-visual words method, applied on several audio segment spectrograms. The latter are considered to be the visual representation of the considered audio segment and may be analyzed by exploiting well-known traditional computer vision techniques, such as construction of a visual vocabulary, extraction of speeded-up robust features (SURF) features, quantization into a set of visual words, and image histogram construction. As a last step, support vector machines (SVM) classifiers are trained based on the aforementioned information. Finally, to further generalize the herein proposed approach, we utilize publicly available datasets from several human languages to perform cross-language experiments, both in terms of actor-created and real-life ones.

Funder

Horizon 2020 Framework Programme

Publisher

MDPI AG

Link

http://www.mdpi.com/2227-7080/7/1/20/pdf

Reference56 articles.

1. Emotion recognition in human-computer interaction

2. Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011

3. Survey on speech emotion recognition: Features, classification schemes, and databases

Cited by 19 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimizing Speech Emotion Recognition with Machine Learning Based Advanced Audio Cue Analysis;Technologies;2024-07-11

2. Smart Detection and Proactive Prevention of Student Depression: A DeepLearning Approach;2024 International Conference on Trends in Quantum Computing and Emerging Business Technologies;2024-03-22

3. Facial Emotion Based Automatic Music Recommender System;2023 International Conference on Sustainable Emerging Innovations in Engineering and Technology (ICSEIET);2023-09-14

4. Time-frequency visual representation and texture features for audio applications: a comprehensive review, recent trends, and challenges;Multimedia Tools and Applications;2023-03-16

5. Intelligent recognition of audio scene based on hybrid attention and parallel deep feature processing under genetic evolutionary computing;Neural Computing and Applications;2023-02-20