Semantic Proposal for Activity Localization in Videos via Sentence Query-Reference-Cited by-同舟云学术

Semantic Proposal for Activity Localization in Videos via Sentence Query

Published:2019-07-17 Issue: Volume:33 Page:8199-8206
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Chen Shaoxiang,Jiang Yu-Gang

Abstract

This paper presents an efficient algorithm to tackle temporal localization of activities in videos via sentence queries. The task differs from traditional action localization in three aspects: (1) Activities are combinations of various kinds of actions and may span a long period of time. (2) Sentence queries are not limited to a predefined list of classes. (3) The videos usually contain multiple different activity instances. Traditional proposal-based approaches for action localization that only consider the class-agnostic “actionness” of video snippets are insufficient to tackle this task. We propose a novel Semantic Activity Proposal (SAP) which integrates the semantic information of sentence queries into the proposal generation process to get discriminative activity proposals. Visual and semantic information are jointly utilized for proposal ranking and refinement. We evaluate our algorithm on the TACoS dataset and the Charades-STA dataset. Experimental results show that our algorithm outperforms existing methods on both datasets, and at the same time reduces the number of proposals by a factor of at least 10.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 62 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Context-aware relational reasoning for video chunks and frames overlapping in language-based moment localization;Neurocomputing;2024-10

2. Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-09-12

3. SgLFT: Semantic-guided Late Fusion Transformer for video corpus moment retrieval;Neurocomputing;2024-09

4. MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

5. Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement;Proceedings of the 2024 International Conference on Multimedia Retrieval;2024-05-30