Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval

Author:

Shi Yaya1ORCID,Liu Haowei2ORCID,Xu Haiyang3ORCID,Ma Zongyang2ORCID,Ye Qinghao3ORCID,Hu Anwen3ORCID,Yan Ming3ORCID,Zhang Ji3ORCID,Huang Fei3ORCID,Yuan Chunfeng2ORCID,Li Bing2ORCID,Hu Weiming2ORCID,Zha Zheng-Jun1ORCID

Affiliation:

1. University of Science and Technology of China, Hefei, China

2. Institute of Automation, CAS & University of Chinese Academy of Sciences, Beijing, China

3. DAMO Academy, Alibaba Group, Hangzhou, China

Publisher

ACM

Reference52 articles.

1. Hassan Akbari , Linagzhe Yuan , Rui Qian , Wei-Hong Chuang , Shih-Fu Chang , Yin Cui , and Boqing Gong . 2021 . Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. arXiv preprint arXiv:2104.11178 (2021). Hassan Akbari, Linagzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. arXiv preprint arXiv:2104.11178 (2021).

2. Humam Alwassel Dhruv Mahajan Bruno Korbar Lorenzo Torresani Bernard Ghanem and Du Tran. 2020. Self-Supervised Learning by Cross-Modal Audio-Video Clustering. In NeurIPS. Humam Alwassel Dhruv Mahajan Bruno Korbar Lorenzo Torresani Bernard Ghanem and Du Tran. 2020. Self-Supervised Learning by Cross-Modal Audio-Video Clustering. In NeurIPS.

3. Lisa Anne Hendricks Oliver Wang Eli Shechtman Josef Sivic Trevor Darrell and Bryan Russell. 2017. Localizing moments in video with natural language. In ICCV. 5803--5812. Lisa Anne Hendricks Oliver Wang Eli Shechtman Josef Sivic Trevor Darrell and Bryan Russell. 2017. Localizing moments in video with natural language. In ICCV. 5803--5812.

4. Jinbin Bai Chunhui Liu Feiyue Ni Haofan Wang Mengying Hu Xiaofeng Guo and Lele Cheng. 2022. LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval. (2022). arxiv: 2207.04858 [cs.CV] Jinbin Bai Chunhui Liu Feiyue Ni Haofan Wang Mengying Hu Xiaofeng Guo and Lele Cheng. 2022. LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval. (2022). arxiv: 2207.04858 [cs.CV]

5. Yang Bai , Xiaoguang Li , Gang Wang , Chaoliang Zhang , Lifeng Shang , Jun Xu , Zhaowei Wang , Fangshan Wang , and Qun Liu . 2020. SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval. ArXiv , Vol. abs/ 2010 .00768 ( 2020 ). Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, and Qun Liu. 2020. SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval. ArXiv, Vol. abs/2010.00768 (2020).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3