BTM: Boundary Trimming Module for Temporal Action Detection-Reference-Cited by-同舟云学术

BTM: Boundary Trimming Module for Temporal Action Detection

Published:2022-10-29 Issue:21 Volume:11 Page:3520
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Hamdi Maher,Wen Shiping,Yang Yin

Abstract

Temporal action detection (TAD) aims to recognize actions as well as their corresponding time spans from an input video. While techniques exist that accurately recognize actions from manually trimmed videos, current TAD solutions often struggle to identify the precise temporal boundaries of each action, which are required in many real applications. This paper addresses this problem with a novel Boundary Trimming Module (BTM), a post-processing method that adjusts the temporal boundaries of the detected actions from existing TAD solutions. Specifically, BTM operates based on the classification of frames in the input video, aiming to detect the action more accurately by adjusting the surrounding frames of the start and end frames of the original detection results. Experimental results on the THUMOS14 benchmark data set demonstrate that the BTM significantly improves the performance of several existing TAD methods. Meanwhile, we establish a new state of the art for temporal action detection through the combination of BTM and the previous best TAD solution.

Funder

Qatar National Research Fund

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/11/21/3520/pdf

Reference48 articles.

1. Yeung, S., Russakovsky, O., Mori, G., and Lei, F.-F. End-to-end learning of action detection from frame glimpses in videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2. Yuan, J., Ni, B., Yang, X., and Kassim, A.A. Temporal action localization with pyramid of score distribution features. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

3. Shou, Z., Wang, D., and Chang, S.F. Temporal action localization in untrimmed videos via multi-stage cnns. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

4. Zhu, Y., and Newsam, S. Efficient action detection in untrimmed videos via multi-task learning. Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

5. Chao, Y.W., Vijayanarasimhan, S., Seybold, B., Ross, D.A., Deng, J., and Sukthankar, R. Rethinking the Faster R-CNN Architecture for Temporal Action Localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.