Affiliation:
1. The Purple Academy of Culture & Creativity Nanjing University of the Arts Nanjing Jiangsu China
2. Faculty of Humanities and Arts Macau University of Science and Technology Macau China
3. College of Art Jiangsu University Zhenjiang Jiangsu China
4. School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China
Abstract
AbstractDiffusion models can generate high‐quality images and have attracted increasing attention. However, diffusion models adopt a progressive optimization process and often have long training and inference time, which limits their application in realistic scenarios. Recently, some latent space diffusion models have partially accelerated training speed by using parameters in the feature space, but additional network structures still require a large amount of unnecessary computation. Therefore, we propose the Contour Wavelet Diffusion method to accelerate the training and inference speed. First, we introduce the contour wavelet transform to extract anisotropic low‐frequency and high‐frequency components from the input image, and achieve acceleration by processing these down‐sampling components. Meanwhile, due to the good reconstructive properties of wavelet transforms, the quality of generated images can be maintained. Second, we propose a Batch‐normalized stochastic attention module that enables the model to effectively focus on important high‐frequency information, further improving the quality of image generation. Finally, we propose a balanced loss function to further improve the convergence speed of the model. Experimental results on several public datasets show that our method can significantly accelerate the training and inference speed of the diffusion model while ensuring the quality of generated images.
Reference40 articles.
1. Diffusion Models in Vision: A Survey
2. BubeckS ChandrasekaranV EldanR et al.Sparks of artificial general intelligence: early experiments with GPT‐4.arXiv;2023.
3. Generative adversarial nets;Goodfellow I;Adv Neural Inform Process Syst,2014
4. Masked generative adversarial networks are data‐efficient generation learners;Huang J;Adv Neural Inform Process Syst,2022