Perceptual Quality-Oriented Rate Allocation via Distillation from End-to-End Image Compression-Reference-Cited by-同舟云学术

Perceptual Quality-Oriented Rate Allocation via Distillation from End-to-End Image Compression

Published:2024-04-25 Issue:7 Volume:20 Page:1-22
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Yang Runyu¹^ORCID,Liu Dong¹^ORCID,Ma Siwei²^ORCID,Wu Feng¹^ORCID,Gao Wen³^ORCID

Affiliation:

1. University of Science and Technology of China, Hefei, China

2. Peking University, Beijing, China

3. Peng Cheng Laboratory, Shenzhen, China and Peking University, Beijing, China

Abstract

Mainstream image/video coding standards, exemplified by the state-of-the-art H.266/VVC, AVS3, and AV1, follow the block-based hybrid coding framework. Due to the block-based framework, encoders designed for these standards are easily optimized for peak signal-to-noise ratio (PSNR) but have difficulties optimizing for the metrics more aligned to perceptual quality, e.g., multi-scale structural similarity (MS-SSIM), since these metrics cannot be accurately evaluated at the small block level. We address this problem by leveraging inspiration from the end-to-end image compression built on deep networks, which is easily optimized through network training for any metric as long as the metric is differentiable. We compared the trained models using the same network structure but different metrics and observed that the models allocate rates in different ratios. We then propose a distillation method to obtain the rate allocation rule from end-to-end image compression models with different metrics and to utilize such a rule in the block-based encoders. We implement the proposed method on the VVC reference software—VTM and the AVS3 reference software—HPM, focusing on intraframe coding. Experimental results show that the proposed method on top of VTM achieves more than 10% BD-rate reduction than the anchor when evaluated with MS-SSIM or LPIPS, which leads to concrete perceptual quality improvement.

Funder

Natural Science Foundation of China

Fundamental Research Funds

Central Universities

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3650034

Reference47 articles.

1. Variational image compression with a scale hyperprior;Ballé Johannes;arXiv preprint arXiv:1802.01436,2018

2. Estimation of distortion sensitivity for visual quality prediction using a convolutional neural network

3. Sebastian Bosse, Michael Dietzel, Sören Becker, Christian R. Helmrich, Mischa Siekmann, Heiko Schwarz, Detlev Marpe, and Thomas Wiegand. 2019. Neural network guided perceptually optimized bit-allocation for block-based image and video compression. In IEEE International Conference on Image Processing (ICIP’19). 126–130.

4. Sebastian Bosse, Christian Helmrich, Heiko Schwarz, Detlev Marpe, and Thomas Wiegand. 2017. Perceptually Optimized QP Adaptation and Associated Distortion Measure. Technical Report JVET-H0047. JVET. Retrieved from https://jvet-experts.org/doc_end_user/current_document.php?id=3319

5. Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC)