Context-Fused Guidance for Image Captioning Using Sequence-Level Training-Reference-Cited by-同舟云学术

Context-Fused Guidance for Image Captioning Using Sequence-Level Training

Published:2022-01-05 Issue: Volume:2022 Page:1-9
ISSN:1687-5273
Container-title:Computational Intelligence and Neuroscience
language:en
Short-container-title:Computational Intelligence and Neuroscience

Author:

Feng Junlong¹^ORCID,Zhao Jianping¹^ORCID

Affiliation:

1. School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, China

Abstract

Recent image captioning models based on the encoder-decoder framework have achieved remarkable success in humanlike sentence generation. However, an explicit separation between encoder and decoder brings out a disconnection between the image and sentence. It usually leads to a rough image description: the generated caption only contains main instances but neglects additional objects and scenes unexpectedly, which reduces the caption consistency of the image. To address this issue, we proposed an image captioning system within context-fused guidance in this paper. It incorporates regional and global image representation as the compositional visual features to learn the objects and attributes in images. To integrate image-level semantic information, the visual concept is employed. To avoid misleading decoding, a context fusion gate is introduced to calculate the textual context by selectively aggregating the information of visual concept and word embedding. Subsequently, the context-fused image guidance is formulated based on the compositional visual features and textual context. It provides the decoder with informative semantic knowledge. Finally, a captioner with a two-layer LSTM architecture is constructed to generate captions. Moreover, to overcome the exposure bias, we train the proposed model through sequence decision-making. The experiments conducted on the MS COCO dataset show the outstanding performance of our work. The linguistic analysis demonstrates that our model improves the caption consistency of the image.

Publisher

Hindawi Limited

Subject

General Mathematics,General Medicine,General Neuroscience,General Computer Science

Link

http://downloads.hindawi.com/journals/cin/2022/9743123.pdf

Reference30 articles.

1. Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs

2. An Overview of Image Caption Generation Methods

3. X-Linear Attention Networks for Image Captioning

4. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

5. Attention on Attention for Image Captioning

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A real-time image captioning framework using computer vision to help the visually impaired;Multimedia Tools and Applications;2023-12-22