Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition-Reference-Cited by-同舟云学术

Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition

Published:2020-11-23 Issue:22 Volume:20 Page:6688
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Lee Sanghyun,Han David K.,Ko Hanseok^ORCID

Abstract

Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural human–computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called “Fusion-ConvBERT”, is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/20/22/6688/pdf

Reference59 articles.

1. Speech emotion recognition: Two decades in a nutshell, benchmarks, and ongoing trends;Schuller;Commun. ACM,2018

2. Two-level hierarchical alignment for semi-coupled HMM-based audiovisual emotion recognition with temporal course;Wu;IEEE Trans. Multimed.,2013

3. Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition;Lin;IEEE Trans. Multimed.,2011

4. The influence of language and culture on the understanding of vocal emotions;Altrov;J. Est. Finno Ugric Linguist.,2015

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ADAM optimised human speech emotion recogniser based on statistical information distribution of chroma, MFCC, and MBSE features;Multimedia Tools and Applications;2024-05-13

2. Improvement of Multimodal Emotion Recognition Based on Temporal-Aware Bi-Direction Multi-Scale Network and Multi-Head Attention Mechanisms;Applied Sciences;2024-04-13

3. Noisy label facial expression recognition via face-specific label distribution learning;Image and Vision Computing;2024-03

4. Facial and speech Emotional Recognition based on Improved Deep Model;2024-03-01

5. Triangular Region Cut-Mix Augmentation Algorithm-Based Speech Emotion Recognition System With Transfer Learning Approach;IEEE Access;2024