Deliberate Attention Networks for Image Captioning

Author:

Gao Lianli,Fan Kaixuan,Song Jingkuan,Liu Xianglong,Xu Xing,Shen Heng Tao

Abstract

In daily life, deliberation is a common behavior for human to improve or refine their work (e.g., writing, reading and drawing). To date, encoder-decoder framework with attention mechanisms has achieved great progress for image captioning. However, such framework is in essential an one-pass forward process while encoding to hidden states and attending to visual features, but lacks of the deliberation action. The learned hidden states and visual attention are directly used to predict the final captions without further polishing. In this paper, we present a novel Deliberate Residual Attention Network, namely DA, for image captioning. The first-pass residual-based attention layer prepares the hidden states and visual attention for generating a preliminary version of the captions, while the second-pass deliberate residual-based attention layer refines them. Since the second-pass is based on the rough global features captured by the hidden layer and visual attention in the first-pass, our DA has the potential to generate better sentences. We further equip our DA with discriminative loss and reinforcement learning to disambiguate image/caption pairs and reduce exposure bias. Our model improves the state-of-the-arts on the MSCOCO dataset and reaches 37.5% BELU-4, 28.5% METEOR and 125.6% CIDEr. It also outperforms the-state-ofthe-arts from 25.1% BLEU-4, 20.4% METEOR and 53.1% CIDEr to 29.4% BLEU-4, 23.0% METEOR and 66.6% on the Flickr30K dataset.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 20 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Benefit from AMR: Image Captioning with Explicit Relations and Endogenous Knowledge;Lecture Notes in Computer Science;2024

2. Image Captioning with Reinforcement Learning;2023 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI);2023-12-10

3. Image Caption And Speech Generation Using LSTM And GTTS API;2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS);2023-08-23

4. A comprehensive survey on image captioning: from handcrafted to deep learning-based techniques, a taxonomy and open research issues;Artificial Intelligence Review;2023-04-17

5. From Less to More: Common-Sense Semantic Perception Benefits Image Captioning;Web and Big Data;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3