Temporally Grounding Language Queries in Videos by Contextual Boundary-Aware Prediction-Reference-Cited by-同舟云学术

Temporally Grounding Language Queries in Videos by Contextual Boundary-Aware Prediction

Published:2020-04-03 Issue:07 Volume:34 Page:12168-12175
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Wang Jingwen,Ma Lin,Jiang Wenhao

Abstract

The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence). It requires certain models to simultaneously perform visual and linguistic understandings. Previous work predominantly ignores the precision of segment localization. Sliding window based methods use predefined search window sizes, which suffer from redundant computation, while existing anchor-based approaches fail to yield precise localization. We address this issue by proposing an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information. To better detect semantic boundaries, we propose to aggregate contextual information by explicitly modeling the relationship between the current element and its neighbors. The most confident segments are subsequently selected based on both anchor and boundary predictions at the testing stage. The proposed model, dubbed Contextual Boundary-aware Prediction (CBP), outperforms its competitors with a clear margin on three public datasets.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 71 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Action-guided prompt tuning for video grounding;Information Fusion;2025-01

2. Triadic temporal-semantic alignment for weakly-supervised video moment retrieval;Pattern Recognition;2024-12

3. Context-aware relational reasoning for video chunks and frames overlapping in language-based moment localization;Neurocomputing;2024-10

4. Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-09-12

5. End-to-end dense video grounding via parallel regression;Computer Vision and Image Understanding;2024-05