A joint hierarchical cross‐attention graph convolutional network for multi‐modal facial expression recognition

Author:

Xu Chujie1,Du Yong1,Wang Jingzi2,Zheng Wenjie1,Li Tiejun1,Yuan Zhansheng1

Affiliation:

1. School of Ocean Information Engineering Jimei University Xiamen People's Republic of China

2. Department of Computer Science National Chengchi University Taiwan People's Republic of China

Abstract

AbstractEmotional recognition in conversations (ERC) is increasingly being applied in various IoT devices. Deep learning‐based multimodal ERC has achieved great success by leveraging diverse and complementary modalities. Although most existing methods try to adopt attention mechanisms to fuse different information, these methods ignore the complementarity between modalities. To this end, the joint cross‐attention model is introduced to alleviate this issue. However, multi‐scale feature information on different modalities is not utilized. Moreover, the context relationship plays an important role in feature extraction in the expression recognition task. In this paper, we propose a novel joint hierarchical graph convolution network (JHGCN) which exploits different layer features and context relationships for facial expression recognition based on audio‐visual (A‐V) information. Specifically, we adopt different deep networks to extract features from different modalities individually. For V modality, we construct V graph data based on patch embeddings which are extracted from the transformer encoder. Moreover, we embed the graph convolution which can leverage the intra‐modality relationships with the transformer encoder. Then, the deep feature from different layers is fed to the hierarchical fusion module to enhance feature representation. At last, we use the joint cross‐attention mechanism to exploit the complementary inter‐modality relationships. To validate the proposed model, we have conducted various experiments on the AffWild2 and CMU‐MOSI datasets. All results confirm that our proposed model achieves highly promising performance compared to the joint cross‐attention model and other methods.

Publisher

Wiley

Subject

Artificial Intelligence,Computational Mathematics

Reference67 articles.

1. Emotion Recognition and Its Applications

2. Modulation of emotion by cognition and cognition by emotion;Blair KS;Neuroimage,2007

3. An argument for basic emotions

4. ChenJ ChenZ ChiZ et al.Facial expression recognition based on facial components detection and hog features. International workshops on electrical and computer engineering subfields; 2014 pp. 884–888.

5. BerrettiS Del BimboA PalaP et al.A set of selected SIFT features for 3D facial expression recognition. Paper presented at: 2010 20th International Conference on Pattern Recognition. IEEE; 2010 pp. 4125–4128.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3