Joint Modality Synergy and Spatio-temporal Cue Purification for Moment Localization-Reference-Cited by-同舟云学术

Joint Modality Synergy and Spatio-temporal Cue Purification for Moment Localization

Published:2022-06-27 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 2022 International Conference on Multimedia Retrieval
language:
Short-container-title:

Author:

Shen Xingyu¹,Lan Long²,Tan Huibin²,Zhang Xiang²,Ma Xurui³,Luo Zhigang³

Affiliation:

1. Science and Technology on Parallel and Distributed Processing, National University of Defense Technology,College of Computer, National University of Defense Technology, Changsha, China

2. Institute for Quantum Information and State Key Laboratory of High Performance Computing, National University of Defense Technology, College of Computer, National University of Defense Technology, Changsha, China

3. Science and Technology on Parallel and Distributed Processing, National University of Defense Technology, College of Computer, National University of Defense Technology, Changsha, China

Funder

National Natural Science Foundation of China

National Grand R&D Plan

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3512527.3531396

Reference32 articles.

1. Meng Cao , Long Chen , Mike Zheng Shou , Can Zhang, and Yuexian Zou. 2021 . On Pursuit of Designing Multi-modal Transformer for Video Grounding. In EMNLP. 9810--9823. Meng Cao, Long Chen, Mike Zheng Shou, Can Zhang, and Yuexian Zou. 2021. On Pursuit of Designing Multi-modal Transformer for Video Grounding. In EMNLP. 9810--9823.

2. Jo a o Carreira and Andrew Zisserman . 2017. Quo Vadis , Action Recognition? A New Model and the Kinetics Dataset . In CVPR. IEEE Computer Society , 4724--4733. Jo a o Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In CVPR. IEEE Computer Society, 4724--4733.

3. Jingyuan Chen Xinpeng Chen Lin Ma Zequn Jie and Tat-Seng Chua. 2018. Temporally Grounding Natural Sentence in Video. In EMNLP. 162--171. Jingyuan Chen Xinpeng Chen Lin Ma Zequn Jie and Tat-Seng Chua. 2018. Temporally Grounding Natural Sentence in Video. In EMNLP. 162--171.

4. Long Chen Chujie Lu Siliang Tang Jun Xiao Dong Zhang Chilie Tan and Xiaolin Li. 2020. Rethinking the Bottom-Up Framework for Query-Based Video Localization. In AAAI. 10551--10558. Long Chen Chujie Lu Siliang Tang Jun Xiao Dong Zhang Chilie Tan and Xiaolin Li. 2020. Rethinking the Bottom-Up Framework for Query-Based Video Localization. In AAAI. 10551--10558.

5. Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language;Chen Shaoxiang;ECCV,2020

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Event-Oriented State Alignment Network for Weakly Supervised Temporal Language Grounding;Entropy;2024-08-27

2. Semantics-Enriched Cross-Modal Alignment for Complex-Query Video Moment Retrieval;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

3. Temporal Sentence Grounding in Videos: A Survey and Future Directions;IEEE Transactions on Pattern Analysis and Machine Intelligence;2023-08

4. Atomic-action-based Contrastive Network for Weakly Supervised Temporal Language Grounding;2023 IEEE International Conference on Multimedia and Expo (ICME);2023-07