Affiliation:
1. Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430079, China
Abstract
In recent years, significant progress has been made in facial expression recognition methods. However, tasks related to facial expression recognition in real environments still require further research. This paper proposes a tri-cross-attention transformer with a multi-feature fusion network (TriCAFFNet) to improve facial expression recognition performance under challenging conditions. By combining LBP (Local Binary Pattern) features, HOG (Histogram of Oriented Gradients) features, landmark features, and CNN (convolutional neural network) features from facial images, the model is provided with a rich input to improve its ability to discern subtle differences between images. Additionally, tri-cross-attention blocks are designed to facilitate information exchange between different features, enabling mutual guidance among different features to capture salient attention. Extensive experiments on several widely used datasets show that our TriCAFFNet achieves the SOTA performance on RAF-DB with 92.17%, AffectNet (7 cls) with 67.40%, and AffectNet (8 cls) with 63.49%, respectively.
Funder
General Project for Education of National Social Science Fund