Omniscient Video Super-Resolution with Explicit-Implicit Alignment-Reference-Cited by-同舟云学术

Omniscient Video Super-Resolution with Explicit-Implicit Alignment

Published:2024-02-07 Issue:5 Volume:20 Page:1-23
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Yi Peng¹^ORCID,Wang Zhongyuan¹^ORCID,Luo Laigan²^ORCID,Jiang Kui¹^ORCID,He Zheng¹^ORCID,Jiang Junjun³^ORCID,Lu Tao⁴^ORCID,Ma Jiayi²^ORCID

Affiliation:

1. School of Computer Science, Wuhan University, China

2. The Electronic Information School, Wuhan University, China

3. The School of Computer Science and Technology, Harbin Institute of Technology, China

4. The School of Computer Science and Engineering, Wuhan Institute of Technology, China

Abstract

When considering the temporal relationships, most previous video super-resolution (VSR) methods follow the iterative or recurrent framework. The iterative framework adopts neighboring low-resolution (LR) frames from a sliding window, while the recurrent framework utilizes the output generated in the previous SR procedure. The hybrid framework combines them but still cannot fully leverage the temporal relationships. Meanwhile, the existing methods are limited in the receptive field of the optical flow or lack semantic constrains on motion information. In this work, we propose an omniscient framework to fully explore the temporal relationships in the video, which encompasses both LR frames and SR outputs from the past, present, and future. The omniscient framework is more generic because the iterative, recurrent, and hybrid frameworks can be regarded as its special cases. Besides, when addressing the motion information, most previous VSR methods adopt the explicit motion estimation and compensation, while many recent methods turn to implicit alignment. In implicit alignment methods, because basic non-local means suffers from heavy computational costs, we improve it by capturing the non-local correlations in a relatively local manner to reduce the complexity. Moreover, we integrate the explicit and implicit methods into an explicit-implicit alignment module to better utilize motion information. We have conducted extensive experiments on public datasets, which show that our method is superior over the state-of-the-art methods in objective metrics, subjective visual quality, and complexity. In particular, on datasets of Vid4 and UDM10, our method improves PSNR by 0.19 dB, 0.49 dB against the most advanced method BasicVSR++, respectively.

Funder

National Natural Science Foundation of China

Key R&D Program of Hubei Province

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3640346

Reference55 articles.

1. MEMC-Net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement;Bao Wenbo;IEEE Transactions on Pattern Analysis and Machine Intelligence,2021

2. Maximum a posteriori video super-resolution using a new multichannel image prior;Belekos Stefanos P.;IEEE Transactions on Image Processing,2010

3. Real-time video super-resolution with spatio-temporal networks and motion compensation;Caballero Jose;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2017

4. Kelvin C.K. Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2021. BasicVSR: The search for essential components in video super-resolution and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4947–4956.

5. Kelvin C.K. Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. 2022. BasicVSR++: Improving video super-resolution with enhanced propagation and alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5972–5981.