Affiliation:
1. Max‐Planck‐Institut für Informatik Germany
Abstract
AbstractFull‐reference image quality metrics (FR‐IQMs) aim to measure the visual differences between a pair of reference and distorted images, with the goal of accurately predicting human judgments. However, existing FR‐IQMs, including traditional ones like PSNR and SSIM and even perceptual ones such as HDR‐VDP, LPIPS, and DISTS, still fall short in capturing the complexities and nuances of human perception. In this work, rather than devising a novel IQM model, we seek to improve upon the perceptual quality of existing FR‐IQM methods. We achieve this by considering visual masking, an important characteristic of the human visual system that changes its sensitivity to distortions as a function of local image content. Specifically, for a given FR‐IQM metric, we propose to predict a visual masking model that modulates reference and distorted images in a way that penalizes the visual errors based on their visibility. Since the ground truth visual masks are difficult to obtain, we demonstrate how they can be derived in a self‐supervised manner solely based on mean opinion scores (MOS) collected from an FR‐IQM dataset. Our approach results in enhanced FR‐IQM metrics that are more in line with human prediction both visually and quantitatively.