GDUI: Guided Diffusion Model for Unlabeled Images-Reference-Cited by-同舟云学术

GDUI: Guided Diffusion Model for Unlabeled Images

Published:2024-03-18 Issue:3 Volume:17 Page:125
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Xie Xuanyuan¹,Zhao Jieyu¹

Affiliation:

1. Mobile Network Application Technology Laboratory, School of Information Science and Engineering, Ningbo University, 818 Fenghua Road, Ningbo 315211, China

Abstract

The diffusion model has made progress in the field of image synthesis, especially in the area of conditional image synthesis. However, this improvement is highly dependent on large annotated datasets. To tackle this challenge, we present the Guided Diffusion model for Unlabeled Images (GDUI) framework in this article. It utilizes the inherent feature similarity and semantic differences in the data, as well as the downstream transferability of Contrastive Language-Image Pretraining (CLIP), to guide the diffusion model in generating high-quality images. We design two semantic-aware algorithms, namely, the pseudo-label-matching algorithm and label-matching refinement algorithm, to match the clustering results with the true semantic information and provide more accurate guidance for the diffusion model. First, GDUI encodes the image into a semantically meaningful latent vector through clustering. Then, pseudo-label matching is used to complete the matching of the true semantic information of the image. Finally, the label-matching refinement algorithm is used to adjust the irrelevant semantic information in the data, thereby improving the quality of the guided diffusion model image generation. Our experiments on labeled datasets show that GDUI outperforms diffusion models without any guidance and significantly reduces the gap between it and models guided by ground-truth labels.

Funder

National Natural Science Foundation of China

National Natural Science Foundation of Zhejiang Province

Publisher

MDPI AG

Link

https://www.mdpi.com/1999-4893/17/3/125/pdf

Reference61 articles.

1. Po, R., Yifan, W., Golyanik, V., Aberman, K., Barron, J.T., Bermano, A.H., Chan, E.R., Dekel, T., Holynski, A., and Kanazawa, A. (2023). State of the art on diffusion models for visual computing. arXiv.

2. Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., and Guo, B. (2022, January 18–24). Vector quantized diffusion model for text-to-image synthesis. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.

3. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv.

4. Nichol, A.Q., and Dhariwal, P. (2021, January 18–24). Improved denoising diffusion probabilistic models. Proceedings of the International Conference on Machine Learning, Virtual Event.

5. Diffusion models beat gans on image synthesis;Dhariwal;Adv. Neural Inf. Process. Syst.,2021