ProposalVLAD with Proposal-Intra Exploring for Temporal Action Proposal Generation-Reference-Cited by-同舟云学术

ProposalVLAD with Proposal-Intra Exploring for Temporal Action Proposal Generation

Published:2023-02-25 Issue:3 Volume:19 Page:1-18
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Xing Kai¹^ORCID,Li Tao¹^ORCID,Wang Xuanhan¹^ORCID

Affiliation:

1. University of Electronic Science and Technology of China, Sichuan Province, China

Abstract

Temporal action proposal generation aims to localize temporal segments of human activities in videos. Current boundary-based proposal generation methods can generate proposals with precise boundary but often suffer from the inferior quality of confidence scores used for proposal retrieving. In this article, we propose an effective and end-to-end action proposal generation method, named ProposalVLAD, with Proposal-Intra Exploring Network (PVPI-Net). We first propose a ProposalVLAD module to dynamically generate global features of the entire video, then we combine the global features and proposal local features to generate the final feature representations for all candidate proposals. Then, we design a novel Proposal-Intra Loss function (PI-Loss) to generate more reliable proposal confidence scores. Extensive experiments on large-scale and challenging datasets demonstrate the effectiveness of our proposed method. Experimental results show that our PVPI-Net achieves significant improvements on two benchmark datasets (i.e., THUMOS’14 and ActivityNet-1.3) and sets new records for temporal action detection task.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3571747

Reference52 articles.

1. Nayyer Aafaq, Naveed Akhtar, Wei Liu, Syed Zulqarnain Gilani, and Ajmal Mian. 2019. Spatio-temporal dynamics and semantic attribute enriched visual encoding for video captioning. In Proceedings of the CVPR. 12487–12496.

2. Shyamal Buch, Victor Escorcia, Chuanqi Shen, Bernard Ghanem, and Juan Carlos Niebles. 2017. SST: Single-stream temporal action proposals. In Proceedings of the CVPR. 2911–2920.

3. Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the CVPR. 961–970.

4. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

5. Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A. Ross, Jia Deng, and Rahul Sukthankar. 2018. Rethinking the faster R-CNN architecture for temporal action localization. In Proceedings of the CVPR. 1130–1139.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MBGNet:Multi-branch boundary generation network with temporal context aggregation for temporal action detection;Applied Intelligence;2024-07-09