Universal Image Restoration with Text Prompt Diffusion
Author:
Yu Bing1ORCID, Fan Zhenghui1, Xiang Xue1, Chen Jiahui1, Huang Dongjin1
Affiliation:
1. Shanghai Film Academy, Shanghai University, Shanghai 200072, China
Abstract
Universal image restoration (UIR) aims to accurately restore images with a variety of unknown degradation types and levels. Existing methods, including both learning-based and prior-based approaches, heavily rely on low-quality image features. However, it is challenging to extract degradation information from diverse low-quality images, which limits model performance. Furthermore, UIR necessitates the recovery of images with diverse and complex types of degradation. Inaccurate estimations further decrease restoration performance, resulting in suboptimal recovery outcomes. To enhance UIR performance, a viable approach is to introduce additional priors. The current UIR methods have problems such as poor enhancement effect and low universality. To address this issue, we propose an effective framework based on a diffusion model (DM) for universal image restoration, dubbed ETDiffIR. Inspired by the remarkable performance of text prompts in the field of image generation, we employ text prompts to improve the restoration of degraded images. This framework utilizes a text prompt corresponding to the low-quality image to assist the diffusion model in restoring the image. Specifically, a novel text–image fusion block is proposed by combining the CLIP text encoder and the DA-CLIP image controller, which integrates text prompt encoding and degradation type encoding into time step encoding. Moreover, to reduce the computational cost of the denoising UNet in the diffusion model, we develop an efficient restoration U-shaped network (ERUNet) to achieve favorable noise prediction performance via depthwise convolution and pointwise convolution. We evaluate the proposed method on image dehazing, deraining, and denoising tasks. The experimental results indicate the superiority of our proposed algorithm.
Funder
Shanghai Natural Science Foundation development fund for Shanghai talents
Reference69 articles.
1. Chen, C., Shi, X., Qin, Y., Li, X., Han, X., Yang, T., and Guo, S. (2022, January 10–14). Real-world blind super-resolution via feature matching with implicit high-resolution priors. Proceedings of the 30th ACM International Conference on Multimedia, Lisbon, Portugal. 2. Ji, S.W., Lee, J., Kim, S.W., Hong, J.P., Baek, S.J., Jung, S.W., and Ko, S.J. (2022, January 21–24). XYDeblur: Divide and conquer for single image deblurring. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA. 3. Yao, M., Huang, J., Jin, X., Xu, R., Zhou, S., Zhou, M., and Xiong, Z. (2023, January 4–6). Generalized Lightness Adaptation with Channel Selective Normalization. Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France. 4. Jiang, K., Wang, Z., Yi, P., Chen, C., Huang, B., Luo, Y., Ma, J., and Jiang, J. (2020, January 13–19). Multi-scale progressive fusion network for single image deraining. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. 5. Tu, Z., Talebi, H., Zhang, H., Yang, F., Milanfar, P., Bovik, A., and Li, Y. (2022, January 21–24). Maxim: Multi-axis mlp for image processing. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.
|
|