Video Captioning with Multi-Faceted Attention-Reference-Cited by-同舟云学术

Video Captioning with Multi-Faceted Attention

Published:2018-12 Issue: Volume:6 Page:173-184
ISSN:2307-387X
Container-title:Transactions of the Association for Computational Linguistics
language:en
Short-container-title:TACL

Author:

Long Xiang¹,Gan Chuang¹,de Melo Gerard²

Affiliation:

1. Tsinghua University,

2. Rutgers University,

Abstract

Video captioning has attracted an increasing amount of interest, due in part to its potential for improved accessibility and information retrieval. While existing methods rely on different kinds of visual features and model architectures, they do not make full use of pertinent semantic cues. We present a unified and extensible framework to jointly leverage multiple sorts of visual features and semantic attributes. Our novel architecture builds on LSTMs with two multi-faceted attention layers. These first learn to automatically select the most salient visual features or semantic attributes, and then yield overall representations for the input and output of the sentence generation component via custom feature scaling operations. Experimental results on the challenging MSVD and MSR-VTT datasets show that our framework outperforms previous work and performs robustly even in the presence of added noise to the features and attributes.

Publisher

MIT Press - Journals

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00013

Reference1 articles.

Cited by 42 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Combinatorial Analysis of Deep Learning and Machine Learning Video Captioning Studies: A Systematic Literature Review;IEEE Access;2024

2. Emotional Video Captioning With Vision-Based Emotion Interpretation Network;IEEE Transactions on Image Processing;2024

3. Deep Learning for Video Captioning;Wireless Networks;2024

4. A comprehensive survey on deep-learning-based visual captioning;Multimedia Systems;2023-09-21

5. Multimodal attention-based transformer for video captioning;Applied Intelligence;2023-07-09