Image Captioning Based on Deep Neural Networks-Reference-Cited by-同舟云学术

Image Captioning Based on Deep Neural Networks

Published:2018 Issue: Volume:232 Page:01052
ISSN:2261-236X
Container-title:MATEC Web of Conferences
language:
Short-container-title:MATEC Web Conf.

Author:

Liu Shuang,Bai Liang,Hu Yanli,Wang Haoran

Abstract

With the development of deep learning, the combination of computer vision and natural language process has aroused great attention in the past few years. Image captioning is a representative of this filed, which makes the computer learn to use one or more sentences to understand the visual content of an image. The meaningful description generation process of high level image semantics requires not only the recognition of the object and the scene, but the ability of analyzing the state, the attributes and the relationship among these objects. Though image captioning is a complicated and difficult task, a lot of researchers have achieved significant improvements. In this paper, we mainly describe three image captioning methods using the deep neural networks: CNN-RNN based, CNN-CNN based and Reinforcement-based framework. Then we introduce the representative work of these three top methods respectively, describe the evaluation metrics and summarize the benefits and major challenges.

Publisher

EDP Sciences

Subject

General Medicine

Link

https://www.matec-conferences.org/10.1051/matecconf/201823201052/pdf

Reference24 articles.

1. Krizhevsky Alex, Sutskever I., and Hinton G. E.. “ImageNet classification with deep convolutional neural networks.” International Conference on Neural Information Processing Systems Curran Associates Inc. 1097-1105. (2012)

2. Girshick Ross, et al. “Region-based Convolutional Networks for Accurate Object Detection and Segmentation.” IEEE Transactions on Pattern Analysis & Machine Intelligence 38.1:142-158. (2015)

3. Devlin Jacob, et al. “Language Models for Image Captioning: The Quirks and What Works.” Computer Science (2015)

4. Fang H., et al. “From captions to visual concepts and back.” Computer Vision and Pattern Recognition IEEE, 1473-1482. (2015)

5. Cho Kyunghyun, et al. “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.” Computer Science (2014)

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A novel system for the classification of zinc-plated components by benchmarking deep neural networks;Expert Systems with Applications;2024-12

2. The Optimal Choice of the Encoder–Decoder Model Components for Image Captioning;Information;2024-08-21

3. Image Caption Generation Through the Integration of CNN-Based Residual Network Architectures and LSTM;2024 7th International Conference on Informatics and Computational Sciences (ICICoS);2024-07-17

4. CaptionCraft: VGG with LSTM for Image Insights;2024 1st International Conference on Trends in Engineering Systems and Technologies (ICTEST);2024-04-11

5. An In-Depth Exploration of Image Captioning Training Approaches and Performance Analysis;2024 IEEE 9th International Conference for Convergence in Technology (I2CT);2024-04-05