A Neural Network Architecture for Children’s Audio–Visual Emotion Recognition

Author:

Matveev Anton1ORCID,Matveev Yuri1,Frolova Olga1,Nikolaev Aleksandr1,Lyakso Elena1ORCID

Affiliation:

1. Child Speech Research Group, Department of Higher Nervous Activity and Psychophysiology, St. Petersburg University, St. Petersburg 199034, Russia

Abstract

Detecting and understanding emotions are critical for our daily activities. As emotion recognition (ER) systems develop, we start looking at more difficult cases than just acted adult audio–visual speech. In this work, we investigate the automatic classification of the audio–visual emotional speech of children, which presents several challenges including the lack of publicly available annotated datasets and the low performance of the state-of-the art audio–visual ER systems. In this paper, we present a new corpus of children’s audio–visual emotional speech that we collected. Then, we propose a neural network solution that improves the utilization of the temporal relationships between audio and video modalities in the cross-modal fusion for children’s audio–visual emotion recognition. We select a state-of-the-art neural network architecture as a baseline and present several modifications focused on a deeper learning of the cross-modal temporal relationships using attention. By conducting experiments with our proposed approach and the selected baseline model, we observe a relative improvement in performance by 2%. Finally, we conclude that focusing more on the cross-modal temporal relationships may be beneficial for building ER systems for child–machine communications and environments where qualified professionals work with children.

Funder

Russian Science Foundation

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Reference63 articles.

1. Speech Emotion Recognition: Two Decades in a Nutshell, Benchmarks, and Ongoing Trends;Schuller;Commun. ACM,2018

2. Speech Emotion Recognition Using Deep Learning Techniques: A Review;Khalil;IEEE Access,2019

3. Approbation of a method for studying the reflection of emotional state in children’s speech and pilot psychophysiological experimental data;Lyakso;Int. J. Adv. Trends Comput. Sci. Eng.,2020

4. Onwujekwe, D. (2021). Using Deep Leaning-Based Framework for Child Speech Emotion Recognition. [Ph.D. Thesis, Virginia Commonwealth University]. Available online: https://scholarscompass.vcu.edu/cgi/viewcontent.cgi?article=7859&context=etd.

5. Guran, A.-M., Cojocar, G.-S., and Diosan, L.-S. (2022). The Next Generation of Edutainment Applications for Young Children—A Proposal. Mathematics, 10.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3