Boosting Video-Text Retrieval with Explicit High-Level Semantics-Reference-Cited by-同舟云学术

Boosting Video-Text Retrieval with Explicit High-Level Semantics

Published:2022-10-10 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 30th ACM International Conference on Multimedia
language:
Short-container-title:

Author:

Wang Haoran¹,Xu Di²,He Dongliang¹,Li Fu¹,Ji Zhong³,Han Jungong⁴,Ding Errui¹

Affiliation:

1. Department of Computer Vision Technology (VIS), Baidu Inc., Beijing, China

2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

3. Tianjin University, Tianjin, China

4. Computer Science Department, Aberystwyth University, SY23 3FL, China

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3503161.3548010

Reference55 articles.

1. P. Anderson , X. He , C. Buehler , D. Teney , M. Johnson , S. Gould , and L. Zhang . 2018. Bottom-up and top-down attention for image captioning and VQA . In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6077--6086 . P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. 2018. Bottom-up and top-down attention for image captioning and VQA. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6077--6086.

2. Localizing Moments in Video with Natural Language

3. S. Antol , A. Agrawal , J. Lu , M. Mitchell , D. Batra , Z. C. Lawrence , and D. Parikh . 2015. VQA: Visual question answering . In IEEE International Conference on Computer Vision. 2425--2433 . S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, Z. C. Lawrence, and D. Parikh. 2015. VQA: Visual question answering. In IEEE International Conference on Computer Vision. 2425--2433.

4. Yalong Bai , Jianlong Fu , Tiejun Zhao , and Tao Mei . 2018 . Deep Attention Neural Tensor Network for Visual Question Answering. In European Conference on Computer Vision. 20--35 . Yalong Bai, Jianlong Fu, Tiejun Zhao, and Tao Mei. 2018. Deep Attention Neural Tensor Network for Visual Question Answering. In European Conference on Computer Vision. 20--35.

5. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MGSGA: Multi-grained and Semantic-Guided Alignment for Text-Video Retrieval;Neural Processing Letters;2024-02-17

2. Deep Boosting Learning: A Brand-New Cooperative Approach for Image-Text Matching;IEEE Transactions on Image Processing;2024

3. Relation Triplet Construction for Cross-modal Text-to-Video Retrieval;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26

4. MEME: Multi-Encoder Multi-Expert Framework with Data Augmentation for Video Retrieval;Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval;2023-07-18

5. Flu-Net: two-stream deep heterogeneous network to detect flu like symptoms from videos using grey wolf optimization algorithm;Journal of Ambient Intelligence and Humanized Computing;2023-03-31