Improving Image Captioning with Conditional Generative Adversarial Nets-Reference-Cited by-同舟云学术

Improving Image Captioning with Conditional Generative Adversarial Nets

Published:2019-07-17 Issue: Volume:33 Page:8142-8150
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Chen Chen,Mu Shuai,Xiao Wanpeng,Ye Zexiong,Wu Liesi,Ju Qi

Abstract

In this paper, we propose a novel conditional-generativeadversarial-nets-based image captioning framework as an extension of traditional reinforcement-learning (RL)-based encoder-decoder architecture. To deal with the inconsistent evaluation problem among different objective language metrics, we are motivated to design some “discriminator” networks to automatically and progressively determine whether generated caption is human described or machine generated. Two kinds of discriminator architectures (CNN and RNNbased structures) are introduced since each has its own advantages. The proposed algorithm is generic so that it can enhance any existing RL-based image captioning framework and we show that the conventional RL training method is just a special case of our approach. Empirically, we show consistent improvements over all language evaluation metrics for different state-of-the-art image captioning models. In addition, the well-trained discriminators can also be viewed as objective image captioning evaluators.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 46 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Audio Description of Videos Using Machine Learning;2024 IEEE 9th International Conference for Convergence in Technology (I2CT);2024-04-05

2. ReverseGAN: An intelligent reverse generative adversarial networks system for complex image captioning generation;Displays;2024-04

3. Multi-Keys Attention Network for Image Captioning;Cognitive Computation;2024-01-24

4. CLIP-Prefix for Image Captioning and an Experiment on Blind Image Guessing;Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering;2024

5. Image captioning based on scene graphs: A survey;Expert Systems with Applications;2023-11