Denoising Diffusion Models on Model-Based Latent Space-Reference-Cited by-同舟云学术

Denoising Diffusion Models on Model-Based Latent Space

Published:2023-10-28 Issue:11 Volume:16 Page:501
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Scribano Carmelo¹^ORCID,Pezzi Danilo¹,Franchini Giorgia¹^ORCID,Prato Marco¹^ORCID

Affiliation:

1. Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, 41125 Modena, Italy

Abstract

With the recent advancements in the field of diffusion generative models, it has been shown that defining the generative process in the latent space of a powerful pretrained autoencoder can offer substantial advantages. This approach, by abstracting away imperceptible image details and introducing substantial spatial compression, renders the learning of the generative process more manageable while significantly reducing computational and memory demands. In this work, we propose to replace autoencoder coding with a model-based coding scheme based on traditional lossy image compression techniques; this choice not only further diminishes computational expenses but also allows us to probe the boundaries of latent-space image generation. Our objectives culminate in the proposal of a valuable approximation for training continuous diffusion models within a discrete space, accompanied by enhancements to the generative model for categorical values. Beyond the good results obtained for the problem at hand, we believe that the proposed work holds promise for enhancing the adaptability of generative diffusion models across diverse data types beyond the realm of imagery.

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Link

https://www.mdpi.com/1999-4893/16/11/501/pdf

Reference36 articles.

1. Denoising diffusion probabilistic models;Ho;Adv. Neural Inf. Process. Syst.,2020

2. Diffusion models beat gans on image synthesis;Dhariwal;Adv. Neural Inf. Process. Syst.,2021

3. Esser, P., Rombach, R., and Ommer, B. (2021, January 20–25). Taming Transformers for High-Resolution Image Synthesis. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.

4. Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. (2021, January 18–24). Zero-shot text-to-image generation. Proceedings of the International Conference on Machine Learning. PMLR, Online.

5. CogView: Mastering Text-to-Image Generation via Transformers;Ranzato;Proceedings of the Advances in Neural Information Processing Systems,2021