Object-Centric Image Generation from Layouts-Reference-Cited by-同舟云学术

Object-Centric Image Generation from Layouts

Published:2021-05-18 Issue:3 Volume:35 Page:2647-2655
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Sylvain Tristan,Zhang Pengchuan,Bengio Yoshua,Hjelm R Devon,Sharma Shikhar

Abstract

We begin with the hypothesis that a model must be able to understand individual objects and relationships between objects in order to generate complex scenes with multiple objects well. Our layout-to-image-generation method, which we call Object-Centric Generative Adversarial Network (or OC-GAN), relies on a novel Scene-Graph Similarity Module (SGSM). The SGSM learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We also propose changes to the conditioning mechanism of the generator that enhance its object instance-awareness. Apart from improving image quality, our contributions mitigate two failure modes in previous approaches: (1) spurious objects being generated without corresponding bounding boxes in the layout, and (2) overlapping bounding boxes in the layout leading to merged objects in images. Extensive quantitative evaluation and ablation studies demonstrate the impact of our contributions, with our model outperforming previous state-of-the-art approaches on both the COCO-Stuff and Visual Genome datasets. Finally, we address an important limitation of evaluation metrics used in previous works by introducing SceneFID -- an object-centric adaptation of the popular Fréchet Inception Distance metric, that is better suited for multi-object images.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SMFS‐GAN: Style‐Guided Multi‐class Freehand Sketch‐to‐Image Synthesis;Computer Graphics Forum;2024-08-07

2. Object–attribute–relation model-based semantic coding for image transmission;Journal of the Franklin Institute;2024-07

3. Image Generation from Hyper Scene Graph with Multiple Types of Trinomial Hyperedges;SN Computer Science;2024-06-07

4. MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis;International Journal of Computer Vision;2024-04-02

5. Boundary-aware GAN for multiple overlapping objects in layout-to-image generation;Multimedia Systems;2024-03-21