Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models-Reference-Cited by-同舟云学术

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Published:2023-07-26 Issue:4 Volume:42 Page:1-13
ISSN:0730-0301
Container-title:ACM Transactions on Graphics
language:en
Short-container-title:ACM Trans. Graph.

Author:

Gal Rinon¹²^ORCID,Arar Moab¹^ORCID,Atzmon Yuval²^ORCID,Bermano Amit H.¹^ORCID,Chechik Gal²³^ORCID,Cohen-Or Daniel¹^ORCID

Affiliation:

1. Tel Aviv University, Tel Aviv, Israel

2. NVIDIA Research, Tel Aviv, Israel

3. Bar-Ilan University, Tel Aviv, Israel

Abstract

Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle with lengthy training times, high storage requirements or loss of identity. To overcome these limitations, we propose an encoder-based domain-tuning approach. Our key insight is that by underfitting on a large set of concepts from a given domain, we can improve generalization and create a model that is more amenable to quickly adding novel concepts from the same domain. Specifically, we employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain, e.g. a specific face, and learns to map it into a word-embedding representing the concept. Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively injest additional concepts. Together, these components are used to guide the learning of unseen concepts, allowing us to personalize a model using only a single image and as few as 5 training steps --- accelerating personalization from dozens of minutes to seconds , while preserving quality. Code and trained encoders will be available at our project page.

Funder

BSF

ISF

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design

Link

https://dl.acm.org/doi/pdf/10.1145/3592133

Reference85 articles.

1. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?

2. Image2StyleGAN++: How to Edit the Embedded Images?

3. Yuval Alaluf , Or Patashnik , and Daniel Cohen-Or . 2021a. ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. arXiv preprint arXiv:2104.02699 ( 2021 ). Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. 2021a. ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. arXiv preprint arXiv:2104.02699 (2021).

4. Yuval Alaluf , Omer Tov , Ron Mokady , Rinon Gal , and Amit H . Bermano . 2021 b. HyperStyle: Style GAN Inversion with HyperNetworks for Real Image Editing . arXiv:2111.15666 [cs.CV] Yuval Alaluf, Omer Tov, Ron Mokady, Rinon Gal, and Amit H. Bermano. 2021b. HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing. arXiv:2111.15666 [cs.CV]

5. Artwork personalization at netflix

Cited by 26 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An improved StyleGAN-based TextToFace model with Local-Global information Fusion;Expert Systems with Applications;2024-09

2. DisenDreamer: Subject-Driven Text-to-Image Generation With Sample-Aware Disentangled Tuning;IEEE Transactions on Circuits and Systems for Video Technology;2024-08

3. Training-Free Consistent Text-to-Image Generation;ACM Transactions on Graphics;2024-07-19

4. DreamFont3D: Personalized Text-to-3D Artistic Font Generation;Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers '24;2024-07-13

5. The Chosen One: Consistent Characters in Text-to-Image Diffusion Models;Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers '24;2024-07-13