Author:
Chang Yeong-Hwa,Chen Yen-Jen,Huang Ren-Hung,Yu Yi-Ting
Abstract
Automatically describing the content of an image is an interesting and challenging task in artificial intelligence. In this paper, an enhanced image captioning model—including object detection, color analysis, and image captioning—is proposed to automatically generate the textual descriptions of images. In an encoder–decoder model for image captioning, VGG16 is used as an encoder and an LSTM (long short-term memory) network with attention is used as a decoder. In addition, Mask R-CNN with OpenCV is used for object detection and color analysis. The integration of the image caption and color recognition is then performed to provide better descriptive details of images. Moreover, the generated textual sentence is converted into speech. The validation results illustrate that the proposed method can provide more accurate description of images.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference36 articles.
1. Image Captioninghttps://www.slideshare.net/mz0502244226/image-captioning
2. A survey on automatic image caption generation;Shuang;Neurocomputing,2018
3. Large Scale Retrieval and Generation of Image Descriptions
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献