Contrastive Learning of View-invariant Representations for Facial Expressions Recognition-Reference-Cited by-同舟云学术

Contrastive Learning of View-invariant Representations for Facial Expressions Recognition

Published:2023-12-11 Issue:4 Volume:20 Page:1-22
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Roy Shuvendu¹^ORCID,Etemad Ali¹^ORCID

Affiliation:

1. Dept. ECE and Ingenuity Labs Research Institute, Queen’s University, Canada

Abstract

Although there has been much progress in the area of facial expression recognition (FER), most existing methods suffer when presented with images that have been captured from viewing angles that are non-frontal and substantially different from those used in the training process. In this article, we propose ViewFX, a novel view-invariant FER framework based on contrastive learning, capable of accurately classifying facial expressions regardless of the input viewing angles during inference. ViewFX learns view-invariant features of expression using a proposed self-supervised contrastive loss, which brings together different views of the same subject with a particular expression in the embedding space. We also introduce a supervised contrastive loss to push the learned view-invariant features of each expression away from other expressions. Since facial expressions are often distinguished with very subtle differences in the learned feature space, we incorporate the Barlow twins loss to reduce the redundancy and correlations of the representations in the learned representations. The proposed method is a substantial extension of our previously proposed CL-MEx, which only had a self-supervised loss. We test the proposed framework on two public multi-view facial expression recognition datasets, KDEF and DDCF. The experiments demonstrate that our approach outperforms previous works in the area and sets a new state-of-the-art for both datasets while showing considerably less sensitivity to challenging angles and the number of output labels used for training. We also perform detailed sensitivity and ablation experiments to evaluate the impact of different components of our model as well as its sensitivity to different parameters.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3632960

Reference84 articles.

1. Felix Albu, Daniela Hagiescu, Liviu Vladutu, and Mihaela-Alexandra Puica. 2015. Neural network approaches for children’s emotion recognition in intelligent learning applications. In 7th Annual International Conference on Education and New Learning Technologies.

2. VICReg: Variance-invariance-covariance regularization for self-supervised learning;Bardes Adrien;arXiv preprint arXiv:2105.04906,2021

3. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning. 1597–1607.

4. Instant stress: Detection of perceived mental stress through smartphone photoplethysmography and thermal imaging;Cho Youngjun;JMIR Mental Health,2019

5. Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. 539–546.