Affiliation:
1. Key Laboratory of Optic-Electronic and Communication, Jiangxi Science and Technology Normal University, Nanchang 330038, China
2. Nanchang Key Laboratory of Failure Perception & Green Energy Materials Intelligent Manufacturing, Nanchang 330038, China
Abstract
A micro-expression (ME), as a spontaneous facial expression, usually occurs instantaneously and is difficult to disguise after an emotion-evoking event. Numerous convolutional neural network (CNN)-based models have been widely explored to recognize MEs for their strong local feature representation ability on images. However, the main drawback of the current methods is their inability to fully extracting holistic contextual information from ME images. To achieve efficient ME learning representation from diverse perspectives, this paper uses Transformer variants as the main backbone and the dual-branch architecture as the main framework to extract meaningful multi-modal contextual features for ME recognition (MER). The first branch leverages an optical flow operator to facilitate the motion information extraction between ME sequences, and the corresponding optical flow maps are fed into the Swin Transformer to acquire motion–spatial representation. The second branch directly sends the apex frame in one ME clip to Mobile ViT (Vision Transformer), which can capture the local–global features of MEs. More importantly, to achieve the optimal feature stream fusion, a CAB (cross attention block) is designed to interact the feature extracted by each branch for adaptive learning fusion. The extensive experimental comparisons on three publicly available ME benchmarks show that the proposed method outperforms the existing MER methods and achieves an accuracy of 81.6% on the combined database.
Funder
National Nature Science Foundation of China
Natural Science Foundation of Jiangxi Province of China
Jiangxi Province Graduate Innovation Special Fund Project
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference50 articles.
1. Yu, W.W., Jiang, J., Yang, K.F., Yan, H.-M., and Li, Y.J. (2023). LGSNet: A Two-Stream Network for Micro-and Macro-Expression Spotting with Background Modeling. IEEE Trans. Affect. Comput., 1–18.
2. Nguyen, X.B., Duong, C.N., Li, X., Gauch, S., Seo, H.-S., and Luu, K. (2023, January 17–24). Micron-BERT: BERT-based Facial Micro-Expression Recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.
3. Micro-expression recognition based on optical flow method and pseudo-3D residual network;Tang;J. Signal Process.,2022
4. Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions;Zhao;IEEE Trans. Pattern Anal. Mach. Intell.,2007
5. Towards Reading Hidden Emotions: A Comparative Study of Spontaneous Micro-Expression Spotting and Recognition Methods;Li;IEEE Trans. Affect. Comput.,2018