ImageSpirit-Reference-Cited by-同舟云学术

ImageSpirit

Published:2014-12-29 Issue:1 Volume:34 Page:1-11
ISSN:0730-0301
Container-title:ACM Transactions on Graphics
language:en
Short-container-title:ACM Trans. Graph.

Author:

Cheng Ming-Ming¹,Zheng Shuai¹,Lin Wen-Yan²,Vineet Vibhav²,Sturgess Paul²,Crook Nigel²,Mitra Niloy J.³,Torr Philip¹

Affiliation:

1. University of Oxford, Oxford, UK

2. Oxford Brookes University, Oxford, UK

3. University College London, London, UK

Abstract

Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixels. In this article we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interest enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g., smartphones, Google Glass, livingroom devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the trade-offs compared to traditional mouse-based interactions, results are reported for both a large-scale quantitative evaluation and a user study.

Funder

European Research Council

Engineering and Physical Sciences Research Council

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design

Link

https://dl.acm.org/doi/pdf/10.1145/2682628

Reference58 articles.

1. Fast High-Dimensional Filtering Using the Permutohedral Lattice

2. PatchMatch

3. B. Berlin and P. Kay. 1991. Basic Color Terms: Their Universality and Evolution. University of California Press. B. Berlin and P. Kay. 1991. Basic Color Terms: Their Universality and Evolution. University of California Press.

4. A Framework for content-adaptive photo manipulation macros

5. “Put-that-there”

Cited by 42 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Language-based Photo Color Adjustment for Graphic Designs;ACM Transactions on Graphics;2023-07-26

2. Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation;2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR);2023-06

3. Image Synthesis from Themes Captured in Poems using Latent Diffusion Models;2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC);2023-05-04

4. Grounding Scene Graphs on Natural Images via Visio-Lingual Message Passing;2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV);2023-01

5. Cartoon Image Processing: A Survey;International Journal of Computer Vision;2022-09-01