Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition-Reference-Cited by-同舟云学术

Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition

Published:2024-04-11 Issue: Volume: Page:
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Tu Zhewei¹,Shu Xiangbo¹,Huang Peng¹,Yan Rui²,Liu Zhenxing³,Zhang Jiachao⁴

Affiliation:

1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

2. Department of Computer Science and Technology, Nanjing University, Nanjing, China

3. School of Information Science and Technology, Wuhan University of Science and Technology, Wuhan, China

4. Artificial Intelligence Industrial Technology Research Institute, Nanjing Institute of Technology, Nanjing, China

Abstract

Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture and scale, leading to the limitation in learning action representations in videos with spatiotemporal redundancy and complexity. Therefore, we revisit the creative trick of weak/strong augmentations in FixMatch, and then propose a novel Frame- and Feature-level augmentation FixMatch (dubbed as F 2 -FixMatch) framework to learn more abundant action representations for being robust to complex and dynamic video scenarios. Specifically, we design a new Progressive Augmentation (P-Aug) mechanism that implements the weak/strong augmentations first at the frame level, and further implements the perturbation at the feature level, to obtain abundant four types of augmented features in broader perturbation spaces. Moreover, we present an evolved Multihead Pseudo-Labeling (MPL) scheme to promote the consistency of features across different augmented versions based on the pseudo labels. We conduct extensive experiments on several public datasets to demonstrate that our F 2 -FixMatch achieves the performance gain compared with current state-of-the-art methods. The source codes of F 2 -FixMatch are publicly available at https://github.com/zwtu/F2FixMatch.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3655025

Reference89 articles.

1. Actor-Aware Self-Supervised Learning for Semi-Supervised Video Representation Learning

2. Philip Bachman Ouais Alsharif and Doina Precup. 2014. Learning with pseudo-ensembles. In Advances in Neural Information Processing Systems (NeurIPS). 1–9.

3. Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. In International Conference on Machine Learning (ICML). 813–824.

4. David Berthelot, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel. 2019. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. arXiv preprint arXiv:1911.09785(2019).

5. David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning. In Advances in Neural Information Processing Systems (NeurIPS). 1–11.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integrating pseudo labeling with contrastive clustering for transformer-based semi-supervised action recognition;Applied Intelligence;2024-08-10