Deliberate Attention Networks for Image Captioning-Reference-Cited by-同舟云学术

Deliberate Attention Networks for Image Captioning

Published:2019-07-17 Issue: Volume:33 Page:8320-8327
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Gao Lianli,Fan Kaixuan,Song Jingkuan,Liu Xianglong,Xu Xing,Shen Heng Tao

Abstract

In daily life, deliberation is a common behavior for human to improve or refine their work (e.g., writing, reading and drawing). To date, encoder-decoder framework with attention mechanisms has achieved great progress for image captioning. However, such framework is in essential an one-pass forward process while encoding to hidden states and attending to visual features, but lacks of the deliberation action. The learned hidden states and visual attention are directly used to predict the final captions without further polishing. In this paper, we present a novel Deliberate Residual Attention Network, namely DA, for image captioning. The first-pass residual-based attention layer prepares the hidden states and visual attention for generating a preliminary version of the captions, while the second-pass deliberate residual-based attention layer refines them. Since the second-pass is based on the rough global features captured by the hidden layer and visual attention in the first-pass, our DA has the potential to generate better sentences. We further equip our DA with discriminative loss and reinforcement learning to disambiguate image/caption pairs and reduce exposure bias. Our model improves the state-of-the-arts on the MSCOCO dataset and reaches 37.5% BELU-4, 28.5% METEOR and 125.6% CIDEr. It also outperforms the-state-ofthe-arts from 25.1% BLEU-4, 20.4% METEOR and 53.1% CIDEr to 29.4% BLEU-4, 23.0% METEOR and 66.6% on the Flickr30K dataset.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Benefit from AMR: Image Captioning with Explicit Relations and Endogenous Knowledge;Lecture Notes in Computer Science;2024

2. Image Captioning with Reinforcement Learning;2023 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI);2023-12-10

3. Image Caption And Speech Generation Using LSTM And GTTS API;2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS);2023-08-23

4. A comprehensive survey on image captioning: from handcrafted to deep learning-based techniques, a taxonomy and open research issues;Artificial Intelligence Review;2023-04-17

5. From Less to More: Common-Sense Semantic Perception Benefits Image Captioning;Web and Big Data;2023