Reading-Strategy Inspired Visual Representation Learning for Text-to-Video Retrieval-Reference-Cited by-同舟云学术

Reading-Strategy Inspired Visual Representation Learning for Text-to-Video Retrieval

Published:2022-08 Issue:8 Volume:32 Page:5680-5694
ISSN:1051-8215
Container-title:IEEE Transactions on Circuits and Systems for Video Technology
language:
Short-container-title:IEEE Trans. Circuits Syst. Video Technol.

Author:

Dong Jianfeng¹^ORCID,Wang Yabing¹,Chen Xianke¹,Qu Xiaoye²^ORCID,Li Xirong³^ORCID,He Yuan⁴^ORCID,Wang Xun¹

Affiliation:

1. College of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou, China

2. School of Electronic Information and Communications, Huazhong University of Science and Technology, Hubei, China

3. Key Laboratory of Data Engineering and Knowledge Engineering and the AIMC Laboratory, School of Information, Renmin University of China, Beijing, China

4. Alibaba Group, Beijing, China

Funder

National Key Research and Development Program of China

NSFC

Public Welfare Technology Research Project of Zhejiang Province

Research Program of Zhejiang Laboratory

Open Projects Program of National Laboratory of Pattern Recognition

Fundamental Research Funds for the Provincial Universities of Zhejiang

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Subject

Electrical and Electronic Engineering,Media Technology

Link

http://xplorestaging.ieee.org/ielx7/76/9849156/09709794.pdf?arnumber=9709794

Reference82 articles.

1. Fusion of Multimodal Embeddings for Ad-Hoc Video Search

2. Auto-captions on GIF: A large-scale video-sentence dataset for vision-language pre-training;pan;arXiv 2007 02375,2020

3. TVQA: Localized, Compositional Video Question Answering

4. AVLnet: Learning audio-visual language representations from instructional videos;rouditchenko;arXiv 2006 09199,2020

5. Universal Weighting Metric Learning for Cross-Modal Retrieval

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video Retrieval;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-09-12

2. Transferable dual multi-granularity semantic excavating for partially relevant video retrieval;Image and Vision Computing;2024-09

3. DI-VTR: Dual inter-modal interaction model for video-text retrieval;Journal of Information and Intelligence;2024-09

4. Statistics Enhancement Generative Adversarial Networks for Diverse Conditional Image Synthesis;IEEE Transactions on Circuits and Systems for Video Technology;2024-07

5. Multilevel Semantic Interaction Alignment for Video–Text Cross-Modal Retrieval;IEEE Transactions on Circuits and Systems for Video Technology;2024-07