Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation-Reference-Cited by-同舟云学术

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

Published:2019-08 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Wang Jing¹,Pan Yingwei²,Yao Ting²,Tang Jinhui¹,Mei Tao²

Affiliation:

1. School of Computer Science and Engineering, Nanjing University of Science and Technology, China

2. JD AI Research, Beijing, China

Abstract

Image paragraph generation is the task of producing a coherent story (usually a paragraph) that describes the visual content of an image. The problem nevertheless is not trivial especially when there are multiple descriptive and diverse gists to be considered for paragraph generation, which often happens in real images. A valid question is how to encapsulate such gists/topics that are worthy of mention from an image, and then describe the image from one topic to another but holistically with a coherent structure. In this paper, we present a new design --- Convolutional Auto-Encoding (CAE) that purely employs convolutional and deconvolutional auto-encoding framework for topic modeling on the region-level features of an image. Furthermore, we propose an architecture, namely CAE plus Long Short-Term Memory (dubbed as CAE-LSTM), that novelly integrates the learnt topics in support of paragraph generation. Technically, CAE-LSTM capitalizes on a two-level LSTM-based paragraph generation framework with attention mechanism. The paragraph-level LSTM captures the inter-sentence dependency in a paragraph, while sentence-level LSTM is to generate one sentence which is conditioned on each learnt topic. Extensive experiments are conducted on Stanford image paragraph dataset, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, CAE-LSTM increases CIDEr performance from 20.93% to 25.15%.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Memory Guided Transformer With Spatio-Semantic Visual Extractor for Medical Report Generation;IEEE Journal of Biomedical and Health Informatics;2024-05

2. Image paragraph captioning with topic clustering and topic shift prediction;Knowledge-Based Systems;2024-02

3. Comprehensive Relation Modelling for Image Paragraph Generation;Machine Intelligence Research;2024-01-12

4. CLID: Controlled-Length Image Descriptions with Limited Data;2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV);2024-01-03

5. A Fine-Grained Image Description Generation Method Based on Joint Objectives;Communications in Computer and Information Science;2024