Author:
Hu Kai,Shen Chaowen,Wang Tianyan,Xu Keer,Xia Qingfeng,Xia Min,Cai Chengxue
Abstract
AbstractTemporal Action Detection (TAD) aims to accurately capture each action interval in an untrimmed video and to understand human actions. This paper comprehensively surveys the state-of-the-art techniques and models used for TAD task. Firstly, it conducts comprehensive research on this field through Citespace and comprehensively introduce relevant dataset. Secondly, it summarizes three types of methods, i.e., anchor-based, boundary-based, and query-based, from the design method level. Thirdly, it summarizes three types of supervised learning methods from the level of learning methods, i.e., fully supervised, weakly supervised, and unsupervised. Finally, this paper explores the current problems, and proposes prospects in TAD task.
Funder
Funding of Special Development Project of Tianchang Intelligent Equipment and Instrument Research Institute
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Reference205 articles.
1. Abdelgwad M (2021) Arabic aspect based sentiment classification using bert. arXiv: 2107.13290
2. Abu-El-Haija S, Kothari N, Lee J, et al (2016) Youtube-8m: a large-scale video classification benchmark. arXiv preprint arXiv:1609.08675
3. Acheampong FA, Nunoo-Mensah H, Chen W (2021) Transformer models for text-based emotion detection: a review of bert-based approaches. Artif Int Rev 54(8):5789–5829
4. Alwassel H, Mahajan D, Korbar B et al (2020) Self-supervised learning by cross-modal audio-video clustering. Adv Neural Inf Process Syst 33:9758–9770
5. Alwassel H, Giancola S, Ghanem B (2021) Tsp: temporally-sensitive pretraining of video encoders for localization tasks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3173–3183
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献