MM-AU:Towards Multimodal Understanding of Advertisement Videos-Reference-Cited by-同舟云学术

MM-AU:Towards Multimodal Understanding of Advertisement Videos

Published:2023-10-26 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 31st ACM International Conference on Multimedia
language:
Short-container-title:

Author:

Bose Digbalay¹^ORCID,Hebbar Rajat¹^ORCID,Feng Tiantian¹^ORCID,Somandepalli Krishna²^ORCID,Xu Anfeng¹^ORCID,Narayanan Shrikanth¹^ORCID

Affiliation:

1. University of Southern California, Los Angeles, CA, USA

2. Google Research, New York, NY, USA

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3581783.3612371

Reference75 articles.

1. Sami Abu-El-Haija , Nisarg Kothari , Joonseok Lee , Apostol Natsev , George Toderici , Balakrishnan Varadarajan , and Sudheendra Vijayanarasimhan . 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. ArXiv , Vol. abs/ 1609 .08675 ( 2016 ). Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Apostol Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. ArXiv, Vol. abs/1609.08675 (2016).

2. Hassan Akbari , Linagzhe Yuan , Rui Qian , Wei-Hong Chuang , Shih-Fu Chang , Yin Cui , and Boqing Gong . 2021 . VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text. In Neural Information Processing Systems. Hassan Akbari, Linagzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text. In Neural Information Processing Systems.

3. Max Bain , Arsha Nagrani , Andrew Brown , and Andrew Zisserman . 2020 . Condensed movies: Story based retrieval with contextual embeddings . In Proceedings of the Asian Conference on Computer Vision. Max Bain, Arsha Nagrani, Andrew Brown, and Andrew Zisserman. 2020. Condensed movies: Story based retrieval with contextual embeddings. In Proceedings of the Asian Conference on Computer Vision.

4. LIRIS-ACCEDE: A Video Database for Affective Content Analysis

5. MovieCLIP: Visual Scene Recognition in Movies

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. News-MESI: A Dataset for Multimodal News Excerpt Segmentation and Identification;IEEE Transactions on Emerging Topics in Computational Intelligence;2024-08