Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition

Author:

Tu Zhewei1,Shu Xiangbo1,Huang Peng1,Yan Rui2,Liu Zhenxing3,Zhang Jiachao4

Affiliation:

1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

2. Department of Computer Science and Technology, Nanjing University, Nanjing, China

3. School of Information Science and Technology, Wuhan University of Science and Technology, Wuhan, China

4. Artificial Intelligence Industrial Technology Research Institute, Nanjing Institute of Technology, Nanjing, China

Abstract

Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture and scale, leading to the limitation in learning action representations in videos with spatiotemporal redundancy and complexity. Therefore, we revisit the creative trick of weak/strong augmentations in FixMatch, and then propose a novel Frame- and Feature-level augmentation FixMatch (dubbed as F 2 -FixMatch) framework to learn more abundant action representations for being robust to complex and dynamic video scenarios. Specifically, we design a new Progressive Augmentation (P-Aug) mechanism that implements the weak/strong augmentations first at the frame level, and further implements the perturbation at the feature level, to obtain abundant four types of augmented features in broader perturbation spaces. Moreover, we present an evolved Multihead Pseudo-Labeling (MPL) scheme to promote the consistency of features across different augmented versions based on the pseudo labels. We conduct extensive experiments on several public datasets to demonstrate that our F 2 -FixMatch achieves the performance gain compared with current state-of-the-art methods. The source codes of F 2 -FixMatch are publicly available at https://github.com/zwtu/F2FixMatch.

Publisher

Association for Computing Machinery (ACM)

Reference89 articles.

1. Actor-Aware Self-Supervised Learning for Semi-Supervised Video Representation Learning

2. Philip Bachman Ouais Alsharif and Doina Precup. 2014. Learning with pseudo-ensembles. In Advances in Neural Information Processing Systems (NeurIPS). 1–9.

3. Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. In International Conference on Machine Learning (ICML). 813–824.

4. David Berthelot, Nicholas Carlini, Ekin D Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel. 2019. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. arXiv preprint arXiv:1911.09785(2019).

5. David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. 2019. Mixmatch: A holistic approach to semi-supervised learning. In Advances in Neural Information Processing Systems (NeurIPS). 1–11.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3