ASSET

Author:

Liu Difan1ORCID,Shetty Sandesh2ORCID,Hinz Tobias3ORCID,Fisher Matthew3ORCID,Zhang Richard3,Park Taesung3,Kalogerakis Evangelos2ORCID

Affiliation:

1. UMass Amherst and Adobe Research

2. UMass Amherst

3. Adobe Research

Abstract

We present ASSET, a neural architecture for automatically modifying an input high-resolution image according to a user's edits on its semantic segmentation map. Our architecture is based on a transformer with a novel attention mechanism. Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolutions. While previous attention mechanisms are computationally too expensive for handling high-resolution images or are overly constrained within specific image regions hampering long-range interactions, our novel attention mechanism is both computationally efficient and effective. Our sparsified attention mechanism is able to capture long-range interactions and context, leading to synthesizing interesting phenomena in scenes, such as reflections of landscapes onto water or fora consistent with the rest of the landscape, that were not possible to generate reliably with previous convnets and transformer approaches. We present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of our method. Our code and dataset are available at our project page: https://github.com/DifanLiu/ASSET

Funder

Adobe Research

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design

Reference79 articles.

1. Quantifying Attention Flow in Transformers

2. Semantic photo manipulation with a generative image prior

3. Iz Beltagy , Matthew E Peters , and Arman Cohan . 2020 . Longformer: The long-document transformer. arXiv:2004.05150. Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv:2004.05150.

4. COCO-Stuff: Thing and Stuff Classes in Context

5. Chenjie Cao , Yuxin Hong , Xiang Li , Chengrong Wang , Chengming Xu , XiangYang Xue , and Yanwei Fu . 2021 . The Image Local Autoregressive Transformer . In Proc. NeurIPS. Chenjie Cao, Yuxin Hong, Xiang Li, Chengrong Wang, Chengming Xu, XiangYang Xue, and Yanwei Fu. 2021. The Image Local Autoregressive Transformer. In Proc. NeurIPS.

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Multimodal Image Synthesis and Editing: The Generative AI Era;IEEE Transactions on Pattern Analysis and Machine Intelligence;2023-12

2. SIEDOB: Semantic Image Editing by Disentangling Object and Background;2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR);2023-06

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3