MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions-Reference-Cited by-同舟云学术

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

Published:2022-06 Issue: Volume: Page:
ISSN:
Container-title:2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
language:
Short-container-title:

Author:

Soldan Mattia¹,Pardo Alejandro¹,Alcazar Juan Leon¹,Heilbron Fabian Caba²,Zhao Chen¹,Giancola Silvio¹,Ghanem Bernard¹

Affiliation:

1. King Abdullah University of Science and Technology (KAUST)

2. Adobe Research

Funder

King Abdullah University of Science and Technology (KAUST)

Publisher

IEEE

Link

http://xplorestaging.ieee.org/ielx7/9878378/9878366/09879482.pdf?arnumber=9879482

Reference41 articles.

1. Video Self-Stitching Graph Network for Temporal Action Localization

2. Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language;songyang;Proceedings of the AAAI Conference on Artificial Intelligence,0

3. G-TAD: Sub-Graph Localization for Temporal Action Detection

4. Transcript to video: Efficient clip sequencing from texts;xiong;ArXiv Preprint,2021

5. Towards Episodic Memory Support for Dementia Patients by Recognizing Objects, Faces and Text in Eye Gaze

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A dense video caption dataset of student classroom behaviors and a baseline model with boundary semantic awareness;Displays;2024-09

2. Professional and novice audio describers: quality assessments and audio interactions;The Journal of Specialised Translation;2024-07-29

3. Language-based machine perception: linguistic perspectives on the compilation of captioning datasets;Digital Scholarship in the Humanities;2024-06-21

4. Synchronization and Standardization of Open Data Platforms: A Systematic Literature Review;TEM Journal;2024-05-28

5. TeViS: Translating Text Synopses to Video Storyboards;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26