Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning-Reference-Cited by-同舟云学术

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

Published:2020-04-03 Issue:07 Volume:34 Page:11701-11708
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Luo Dezhao,Liu Chang,Zhou Yu,Yang Dongbao,Ma Can,Ye Qixiang,Wang Weiping

Abstract

We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates “blanks” by withholding video clips and then creates “options” by applying spatio-temporal operations on the withheld clips. Finally, it fills the blanks with “options” and learns representations by predicting the categories of operations applied on the clips. VCP can act as either a proxy task or a target task in self-supervised learning. As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning. As a target task, it can assess learned representation models in a uniform and interpretable manner. With VCP, we train spatial-temporal representation models (3D-CNNs) and apply such models on action recognition and video retrieval tasks. Experiments on commonly used benchmarks show that the trained models outperform the state-of-the-art self-supervised models with significant margins.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 69 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Knowledge-guided pre-training and fine-tuning: Video representation learning for action recognition;Neurocomputing;2024-02

2. Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection;IEEE Transactions on Pattern Analysis and Machine Intelligence;2024-01

3. Revisiting Hard Negative Mining in Contrastive Learning for Visual Understanding;Electronics;2023-12-04

4. Cross-Architecture Relational Consistency for Point Cloud Self-Supervised Learning;2023 IEEE 35th International Conference on Tools with Artificial Intelligence (ICTAI);2023-11-06

5. Data-Efficient Masked Video Modeling for Self-supervised Action Recognition;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26