SL-Swin: A Transformer-Based Deep Learning Approach for Macro- and Micro-Expression Spotting on Small-Size Expression Datasets-Reference-Cited by-同舟云学术

SL-Swin: A Transformer-Based Deep Learning Approach for Macro- and Micro-Expression Spotting on Small-Size Expression Datasets

Published:2023-06-13 Issue:12 Volume:12 Page:2656
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

He Erheng¹^ORCID,Chen Qianru²,Zhong Qinghua¹²^ORCID

Affiliation:

1. School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China

2. School of Electronics and Information Engineering, South China Normal University, Foshan 528225, China

Abstract

In recent years, the analysis of macro- and micro-expression has drawn the attention of researchers. These expressions provide visual cues to an individual’s emotions, which can be used in a broad range of potential applications such as lie detection and policing. In this paper, we address the challenge of spotting facial macro- and micro-expression from videos and present compelling results by using a deep learning approach to analyze the optical flow features. Unlike other deep learning approaches that are mainly based on Convolutional Neural Networks (CNNs), we propose a Transformer-based deep learning approach that predicts a score indicating the probability of a frame being within an expression interval. In contrast to other Transformer-based models that achieve high performance by being pre-trained on large datasets, our deep learning model, called SL-Swin, which incorporates Shifted Patch Tokenization and Locality Self-Attention into the backbone Swin Transformer network, effectively spots macro- and micro-expressions by being trained from scratch on small-size expression datasets. Our evaluation outcomes surpass the MEGC 2022 spotting baseline result, obtaining an overall F1-score of 0.1366. Additionally, our approach performs well on the MEGC 2021 spotting task, with an overall F1-score of 0.1824 and 0.1357 on the CAS(ME)2 and SAMM Long Videos, respectively. The code is publicly available on GitHub.

Funder

Special Construction Fund of the Faculty of Engineering

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/12/2656/pdf

Reference41 articles.

1. How Fast are the Leaked Facial Expressions: The Duration of Micro-Expressions;Yan;J. Nonverbal Behav.,2013

2. Fully Automatic Recognition of the Temporal Phases of Facial Actions;Valstar;IEEE Trans. Syst. Man Cybern. Part B,2012

3. Video-Based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Algorithms;Ben;IEEE Trans. Pattern Anal. Mach. Intell.,2022

4. A main directional maximal difference analysis for spotting facial movements from long-term videos;Wang;Neurocomputing,2017

5. Yang, B., Wu, J., Zhou, Z., Komiya, M., Kishimoto, K., Xu, J., Nonaka, K., Horiuchi, T., Komorita, S., and Hattori, G. (2021, January 20–24). Facial Action Unit-Based Deep Learning Framework for Spotting Macro- and Micro-Expressions in Long Video Sequences. Proceedings of the 29th ACM International Conference on Multimedia (MM ’21), Virtual Event, China.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Systematic Review of Emotion Detection with Computer Vision and Deep Learning;Sensors;2024-05-28

2. Deep and Machine Learning for monitoring groundwater levels and hydrological changes using GRACE and SENTINEL-1 for the Ganga River basin;2024 International Conference on Machine Intelligence for GeoAnalytics and Remote Sensing (MIGARS);2024-04-08

3. AutoMEDSys: automatic facial Micro-Expression Detection System using random Fourier Features based Neural Network;International Journal of Information Technology;2023-12-26

4. PSRGAN: Perception-Design-Oriented Image Super Resolution Generative Adversarial Network;Electronics;2023-10-27