Temporal Dropout for Weakly Supervised Action Localization-Reference-Cited by-同舟云学术

Temporal Dropout for Weakly Supervised Action Localization

Published:2023-02-25 Issue:3 Volume:19 Page:1-24
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Xie Chi¹^ORCID,Zhuang Zikun¹^ORCID,Zhao Shengjie¹^ORCID,Liang Shuang¹^ORCID

Affiliation:

1. Tongji University, Jiading Qu, Shanghai Shi, China

Abstract

Weakly supervised action localization is a challenging problem in video understanding and action recognition. Existing models usually formulate the training process as direct classification using video-level supervision. They tend to only locate the most discriminative parts of action instances and produce temporally incomplete detection results. A natural solution for this problem, the adversarial erasing strategy, is to remove such parts from training so that models can attend to complementary parts. Previous works do it in an offline and heuristic way. They adopt a multi-stage pipeline, where discriminative regions are determined and erased under the guidance of detection results from last stage. Such a pipeline can be both ineffective and inefficient, possibly hindering the overall performance. On the contrary, we combine adversarial erasing with dropout mechanism and propose a Temporal Dropout Module that learns where to remove in a data-driven and online manner. This plug-and-play module is trained without iterative stages, which not only simplifies the pipeline but also makes the regularization during training easier and more adaptive. Experiments show that the proposed method outperforms previous erasing-based methods by a large margin. More importantly, it achieves universal improvement when plugged into various direct classification methods and obtains state-of-the-art performance.

Funder

National Natural Science Foundation of China

Natural Science Foundation of Shanghai

Shanghai Innovation Action Project of Science and Technology

National Key Research and Development Project

Shanghai Municipal Science and Technology Major Project

Fundamental Research Funds for the Central Universities

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3567827

Reference62 articles.

1. Anurag Arnab, Chen Sun, Arsha Nagrani, and Cordelia Schmid. 2020. Uncertainty-aware weakly supervised action detection from untrimmed videos. In Proceedings of the European Conference on Computer Vision. Springer, 751–768.

2. ActivityNet: A large-scale video benchmark for human activity understanding

3. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

4. Rethinking the Faster R-CNN Architecture for Temporal Action Localization

5. Relation Attention for Temporal Action Localization

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Weakly supervised temporal action localization with actionness-guided false positive suppression;Neural Networks;2024-07

2. Discriminative Action Snippet Propagation Network for Weakly Supervised Temporal Action Localization;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-03-08

3. Addressing Missing Part Interaction in Skeleton-Text Contrastive Learning for Action Recognition;2024

4. Relation with Free Objects for Action Recognition;ACM Transactions on Multimedia Computing, Communications, and Applications;2023-10-18

5. Patch excitation network for boxless action recognition in still images;The Visual Computer;2023-09-25