Author:
Pande Ashtavinayak,Pandey Atul,Solanki Ayush,Shanbhag Chinmay,Motghare Manish
Abstract
We address the task of news-image captioning, which generates a description of an image given the image and its article body as input. The motive is to automatically generate captions for news images which if needed can then be used as reference captions for manually creating news image captions This task is more challenging than conventional image captioning because it requires a joint understanding of image and text. We present an N-Gram model that integrates text and image modalities and attends to textual features from visual features in generating a caption. Experiments based on automatic evaluation metrics and human evaluation show that an article text provides primary information to reproduce news-image captions written by journalists. The results also demonstrate that the proposed model outperforms the state-of-the-art model. In addition, we also confirm that visual features contribute to improving the quality of news-image captions. Also, we present a website that takes an image and its associated article as input and generates a one-liner caption for the same.
Publisher
Perpetual Innovation Media Pvt. Ltd.
Reference41 articles.
1. A. Chang, M. S. and Manning, C. 2014. Interactive learning of spatial knowledge for text to 3d scene generation.
2. A. Farhadi, M. Hejrati, M. S. P. Y. C. R. J. H. and Forsyth, D. 2010. . every picture tells a story: Generating sentences from images.
3. A. Quattoni, A. Ramisa, P. S. E. S.-S. and MorenoNoguer, F. 2016. . structured prediction with output embeddings for semantic image annotation.
4. Arnau Ramisa*, Fei Yan*, F. M.-N. and Mikolajczyk, K. . breakingnews: Article annotation by image and text processing.
5. C. Rashtchian, P. Young, M. H. and Hockenmaier, J. 2010. Collecting image annotations using amazon’s mechanical turk.