Deep Temporal–Spatial Aggregation for Video-Based Facial Expression Recognition-Reference-Cited by-同舟云学术

Deep Temporal–Spatial Aggregation for Video-Based Facial Expression Recognition

Published:2019-01-05 Issue:1 Volume:11 Page:52
ISSN:2073-8994
Container-title:Symmetry
language:en
Short-container-title:Symmetry

Author:

Pan Xianzhang^ORCID,Guo Wenping,Guo Xiaoying,Li Wenshu,Xu Junjie,Wu Jinzhao

Abstract

The proposed method has 30 streams, i.e., 15 spatial streams and 15 temporal streams. Each spatial stream corresponds to each temporal stream. Therefore, this work correlates with the symmetry concept. It is a difficult task to classify video-based facial expression owing to the gap between the visual descriptors and the emotions. In order to bridge the gap, a new video descriptor for facial expression recognition is presented to aggregate spatial and temporal convolutional features across the entire extent of a video. The designed framework integrates a state-of-the-art 30 stream and has a trainable spatial–temporal feature aggregation layer. This framework is end-to-end trainable for video-based facial expression recognition. Thus, this framework can effectively avoid overfitting to the limited emotional video datasets, and the trainable strategy can learn to better represent an entire video. The different schemas for pooling spatial–temporal features are investigated, and the spatial and temporal streams are best aggregated by utilizing the proposed method. The extensive experiments on two public databases, BAUM-1s and eNTERFACE05, show that this framework has promising performance and outperforms the state-of-the-art strategies.

Funder

Zhejiang Provincial National Science Foundation of China

National Science Foundation of China

Publisher

MDPI AG

Subject

Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2073-8994/11/1/52/pdf

Reference51 articles.

1. Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition

2. Facial expression of pain: An evolutionary account

3. A New Feature Extraction Method Based on EEMD and Multi-Scale Fuzzy Entropy for Motor Bearing

4. Facial expression analysis for predicting unsafe driving behavior

5. A Novel Fault Diagnosis Method Based on Integrating Empirical Wavelet Transform and Fuzzy Entropy for Motor Bearing

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Observe finer to select better: Learning key frame extraction via semantic coherence for dynamic facial expression recognition in the wild;Information Sciences;2025-01

2. Empower smart cities with sampling-wise dynamic facial expression recognition via frame-sequence contrastive learning;Computer Communications;2024-02

3. MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

4. Audio-Video Based Fusion Mechanism Using Deep Learning for Categorical Emotion Recognition;2023 International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS);2023-10-18

5. Spiking-Fer: Spiking Neural Network for Facial Expression Recognition With Event Cameras;20th International Conference on Content-based Multimedia Indexing;2023-09-20