A joint hierarchical cross‐attention graph convolutional network for multi‐modal facial expression recognition-Reference-Cited by-同舟云学术

A joint hierarchical cross‐attention graph convolutional network for multi‐modal facial expression recognition

Published:2023-10-25 Issue: Volume: Page:
ISSN:0824-7935
Container-title:Computational Intelligence
language:en
Short-container-title:Computational Intelligence

Author:

Xu Chujie¹,Du Yong¹,Wang Jingzi²,Zheng Wenjie¹,Li Tiejun¹,Yuan Zhansheng¹

Affiliation:

1. School of Ocean Information Engineering Jimei University Xiamen People's Republic of China

2. Department of Computer Science National Chengchi University Taiwan People's Republic of China

Abstract

AbstractEmotional recognition in conversations (ERC) is increasingly being applied in various IoT devices. Deep learning‐based multimodal ERC has achieved great success by leveraging diverse and complementary modalities. Although most existing methods try to adopt attention mechanisms to fuse different information, these methods ignore the complementarity between modalities. To this end, the joint cross‐attention model is introduced to alleviate this issue. However, multi‐scale feature information on different modalities is not utilized. Moreover, the context relationship plays an important role in feature extraction in the expression recognition task. In this paper, we propose a novel joint hierarchical graph convolution network (JHGCN) which exploits different layer features and context relationships for facial expression recognition based on audio‐visual (A‐V) information. Specifically, we adopt different deep networks to extract features from different modalities individually. For V modality, we construct V graph data based on patch embeddings which are extracted from the transformer encoder. Moreover, we embed the graph convolution which can leverage the intra‐modality relationships with the transformer encoder. Then, the deep feature from different layers is fed to the hierarchical fusion module to enhance feature representation. At last, we use the joint cross‐attention mechanism to exploit the complementary inter‐modality relationships. To validate the proposed model, we have conducted various experiments on the AffWild2 and CMU‐MOSI datasets. All results confirm that our proposed model achieves highly promising performance compared to the joint cross‐attention model and other methods.

Publisher

Wiley

Subject

Artificial Intelligence,Computational Mathematics

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/coin.12607

Reference67 articles.

1. Emotion Recognition and Its Applications

2. Modulation of emotion by cognition and cognition by emotion;Blair KS;Neuroimage,2007

3. An argument for basic emotions

4. ChenJ ChenZ ChiZ et al.Facial expression recognition based on facial components detection and hog features. International workshops on electrical and computer engineering subfields; 2014 pp. 884–888.

5. BerrettiS Del BimboA PalaP et al.A set of selected SIFT features for 3D facial expression recognition. Paper presented at: 2010 20th International Conference on Pattern Recognition. IEEE; 2010 pp. 4125–4128.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hybrid Grey Wolf Optimizer with Ant Lion Optimization Algorithm based Deep Learning for Face Expression Recognition;2024 International Conference on Integrated Circuits and Communication Systems (ICICACS);2024-02-23

2. Context Transformer and Adaptive Method with Visual Transformer for Robust Facial Expression Recognition;Applied Sciences;2024-02-14