Affiliation:
1. Delhi Technological University
Abstract
Abstract
Recently, Image Captioning has evolved into an immensely popular area in the field of Computer Vision. It strives to generate natural language sentences in order to describe the salient parts of a given image. Research in this area is active and various Machine learning-based Image Captioning models have been proposed in the literature. The main challenge with the existing approaches is effectively extracting image features to generate adequate image captions. Further, there is a need to improve the generalizability of the results on large and diverse datasets. In the current paper, a novel method framework, namely Next-LSTM is proposed for Image Captioning. It first extracts the image features using ResNeXt. It is a powerful convolution neural network-based model that is adopted for the first time in the Image Captioning domain. Later, it applies a Long-short term memory network on the extracted features to generate accurate captions for the images. The proposed framework is then evaluated on the benchmark Flickr-8k dataset on Accuracy. The performance of the proposed framework is also compared to the state-of-the-art approaches and it outperforms the existing approaches.
Publisher
Research Square Platform LLC