Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images-Reference-Cited by-同舟云学术

Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images

Published:2022-10-22 Issue:11 Volume:8 Page:294
ISSN:2313-433X
Container-title:Journal of Imaging
language:en
Short-container-title:J. Imaging

Author:

Nursikuwagus Agus^ORCID,Munir Rinaldi,Khodra Masayu Leylia

Abstract

Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, generating captions from the geological images of rocks is more focused on the background of the images. This study proposed image captioning using a convolutional neural network, long short-term memory, and word2vec to generate words from the image. The proposed model was constructed by a convolutional neural network (CNN), long short-term memory (LSTM), and word2vec and gave a dense output of 256 units. To make it properly grammatical, a sequence of predicted words was reconstructed into a sentence by the beam search algorithm with K = 3. An evaluation of the pre-trained baseline model VGG16 and our proposed CNN-A, CNN-B, CNN-C, and CNN-D models used BLEU score methods for the N-gram. The BLEU scores achieved for BLEU-1 using these models were 0.5515, 0.6463, 0.7012, 0.7620, and 0.5620, respectively. BLEU-2 showed scores of 0.6048, 0.6507, 0.7083, 0.8756, and 0.6578, respectively. BLEU-3 performed with scores of 0.6414, 0.6892, 0.7312, 0.8861, and 0.7307, respectively. Finally, BLEU-4 had scores of 0.6526, 0.6504, 0.7345, 0.8250, and 0.7537, respectively. Our CNN-C model outperformed the other models, especially the baseline model. Furthermore, there are several future challenges in studying captions, such as geological sentence structure, geological sentence phrase, and constructing words by a geological tagger.

Funder

Ministry of Research, Technology and Higher Education, Republic of Indonesia

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Vision and Pattern Recognition,Radiology, Nuclear Medicine and imaging

Link

https://www.mdpi.com/2313-433X/8/11/294/pdf

Reference72 articles.

1. ImageNet classification with deep convolutional neural networks

2. Deep Visual-Semantic Alignments for Generating Image Descriptions

3. Phrase-Based Image Captioning;Lebret;Proceedings of the 32nd International Conference on Machine Learning, ICML,2015

4. A Theoretical Analysis of Feature Pooling in Visual Recognition;Boureau;Proceedings of the International Conference on Machine Learning,2010

5. Gradient-based learning applied to document recognition

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automatic image caption generation using deep learning;Multimedia Tools and Applications;2023-06-01

2. Image Captioning for Colorectal Cancer Using Deep Learning Approaches;Algorithms for Intelligent Systems;2023