A Neural Space-Time Representation for Text-to-Image Personalization-Reference-Cited by-同舟云学术

A Neural Space-Time Representation for Text-to-Image Personalization

Published:2023-12-05 Issue:6 Volume:42 Page:1-10
ISSN:0730-0301
Container-title:ACM Transactions on Graphics
language:en
Short-container-title:ACM Trans. Graph.

Author:

Alaluf Yuval¹,Richardson Elad¹,Metzer Gal¹,Cohen-Or Daniel¹

Affiliation:

1. Tel Aviv University, Israel

Abstract

A key aspect of text-to-image personalization methods is the manner in which the target concept is represented within the generative process. This choice greatly affects the visual fidelity, downstream editability, and disk space needed to store the learned concept. In this paper, we explore a new text-conditioning space that is dependent on both the denoising process timestep (time) and the denoising U-Net layers (space) and showcase its compelling properties. A single concept in the space-time representation is composed of hundreds of vectors, one for each combination of time and space , making this space challenging to optimize directly. Instead, we propose to implicitly represent a concept in this space by optimizing a small neural mapper that receives the current time and space parameters and outputs the matching token embedding. In doing so, the entire personalized concept is represented by the parameters of the learned mapper, resulting in a compact, yet expressive, representation. Similarly to other personalization methods, the output of our neural mapper resides in the input space of the text encoder. We observe that one can significantly improve the convergence and visual fidelity of the concept by introducing a textual bypass , where our neural mapper additionally outputs a residual that is added to the output of the text encoder. Finally, we show how one can impose an importance-based ordering over our implicit representation, providing users control over the reconstruction and editability of the learned concept using a single trained model. We demonstrate the effectiveness of our approach over a range of concepts and prompts, showing our method's ability to generate high-quality and controllable compositions without fine-tuning any parameters of the generative model itself.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design

Link

https://dl.acm.org/doi/pdf/10.1145/3618322

Reference49 articles.

1. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?

2. Image2StyleGAN++: How to Edit the Embedded Images?

3. Moab Arar , Rinon Gal , Yuval Atzmon , Gal Chechik , Daniel Cohen-Or , Ariel Shamir , and Amit H . Bermano . 2023 . Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models . arXiv:2307.06925 [cs.CV] Moab Arar, Rinon Gal, Yuval Atzmon, Gal Chechik, Daniel Cohen-Or, Ariel Shamir, and Amit H. Bermano. 2023. Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models. arXiv:2307.06925 [cs.CV]

4. Omri Avrahami , Kfir Aberman , Ohad Fried , Daniel Cohen-Or , and Dani Lischinski . 2023. Break-A-Scene: Extracting Multiple Concepts from a Single Image. arXiv preprint arXiv:2305.16311 ( 2023 ). Omri Avrahami, Kfir Aberman, Ohad Fried, Daniel Cohen-Or, and Dani Lischinski. 2023. Break-A-Scene: Extracting Multiple Concepts from a Single Image. arXiv preprint arXiv:2305.16311 (2023).

5. Yogesh Balaji Seungjun Nah Xun Huang Arash Vahdat Jiaming Song Qinsheng Zhang Karsten Kreis Miika Aittala Timo Aila Samuli Laine Bryan Catanzaro Tero Karras and Ming-Yu Liu. 2023. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers. arXiv:2211.01324 [cs.CV] Yogesh Balaji Seungjun Nah Xun Huang Arash Vahdat Jiaming Song Qinsheng Zhang Karsten Kreis Miika Aittala Timo Aila Samuli Laine Bryan Catanzaro Tero Karras and Ming-Yu Liu. 2023. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers. arXiv:2211.01324 [cs.CV]

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. EASI-Tex: Edge-Aware Mesh Texturing from Single Image;ACM Transactions on Graphics;2024-07-19

2. Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models;2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV);2024-01-03