A survey on deep learning-based spatio-temporal action detection-Reference-Cited by-同舟云学术

A survey on deep learning-based spatio-temporal action detection

Published:2024-02-09 Issue:04 Volume:22 Page:
ISSN:0219-6913
Container-title:International Journal of Wavelets, Multiresolution and Information Processing
language:en
Short-container-title:Int. J. Wavelets Multiresolut Inf. Process.

Author:

Wang Peng¹^ORCID,Zeng Fanwei²^ORCID,Qian Yuntao¹^ORCID

Affiliation:

1. College of Computer Science, Zhejiang University, Hangzhou, Zhejiang 310007, P. R. China

2. Ant Group, Hangzhou, Zhejiang 310007, P. R. China

Abstract

Spatio-temporal action detection (STAD) aims to classify the actions present in a video and localize them in space and time. It has become a particularly active area of research in computer vision because of its explosively emerging real-world applications, such as autonomous driving, visual surveillance and entertainment. Many efforts have been devoted in recent years to build a robust and effective framework for STAD. This paper provides a comprehensive review of the state-of-the-art deep learning-based methods for STAD. First, a taxonomy is developed to organize these methods. Next, the linking algorithms, which aim to associate the frame- or clip-level detection results together to form action tubes, are reviewed. Then, the commonly used benchmark datasets and evaluation metrics are introduced, and the performance of state-of-the-art models is compared. At last, this paper is concluded, and a set of potential research directions of STAD are discussed.

Funder

National Natural Science Foundation of China

National Key R&D Program of China

Publisher

World Scientific Pub Co Pte Ltd

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0219691323500662

Reference92 articles.

1. Generative adversarial network based abnormal behavior detection in massive crowd videos: a Hajj case study

2. Hybrid Classifiers for Spatio-Temporal Abnormal Behavior Detection, Tracking, and Recognition in Massive Hajj Crowds

3. CNN-Based Multiple Path Search for Action Tube Detection in Videos

4. Actions as space-time shapes

5. ActivityNet: A large-scale video benchmark for human activity understanding

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Online Hierarchical Linking of Action Tubes for Spatio-Temporal Action Detection Based on Multiple Clues;IEEE Access;2024