Affiliation:
1. Department of Computer Science and Engineering, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
2. Chinese Academy of Sciences, Beijing, China
3. Department of Micro/Nano Electronics, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Abstract
In the current landscape, high-resolution (HR) videos have gained immense popularity, promising an elevated viewing experience. Recent research has demonstrated that the video super-resolution (SR) algorithm, empowered by deep neural networks (DNNs), can substantially enhance the quality of HR videos by processing low-resolution (LR) frames. However, the existing DNN models demand significant computational resources, posing challenges for the deployment of SR algorithms on client devices. While numerous accelerators have proposed solutions, their primary focus remains on client-side optimization. In contrast, our research recognizes that the HR video is originally stored in the cloud server and presents an untapped opportunity for achieving both high accuracy and performance improvements. Building on this insight, this paper introduces an end-to-end video CODEC-assisted super-resolution (E
2
SR+) algorithm, which tightly integrates the cloud server with the client device to deliver a seamless and real-time video viewing experience. We propose the motion vector search algorithm executed in the cloud server, which can search the motion vectors and residuals for part of HR video frames and then pack them as addons. We also design an auto-encoder algorithm to down-sample the residuals to save the bitstream cost while guaranteeing the quality of the residuals. Lastly, we propose the reconstruction algorithm performed in the client to fast reconstruct the corresponding HR frames using the addons to skip part of DNN computations. To implement the E
2
SR+ algorithm, we design the corresponding E
2
SR+ architecture in the client, which achieves significant speedup with minimal hardware overhead.
Given that the environmental condition varies in the server-client hierarchies, we believe that simply applying E
2
SR+ to all frames is irrational. Accordingly, we offer an environmental condition aware system to chase the best performance while adapting to the diverse environment. In the system, we design a linear programming (LP) model to simulate the environment and allocate frames to three existing mechanisms.
Our experimental results demonstrate that the E
2
SR+ algorithm enhances the PSNR by 1.2, 2.5, and 2.3 compared to the SOTA methods “EDVR”, “BasicVSR”, and “BasicVSR++”, respectively. In terms of performance, the E
2
SR+ architecture offers significant improvements over existing SOTA methods. For instance, while BasicVSR++ requires 98ms on Nvidia V100 GPU to generate a 1280 × 720 HR frame, the E
2
SR+ architecture reduces the execution time to just 39ms, highlighting the efficiency and effectiveness of our proposed method. Overall, the E
2
SR+ architecture respectively achieves 1.4 ×, 2.2 ×, 4.6 ×, and 442.0 × performance improvement compared to ADAS, ISRAcc, NVIDIA V100 GPU, and CPU. Lastly, the proposed system showcases its superiority and surpasses all the existing mechanisms in terms of execution time when varying environmental conditions.
Publisher
Association for Computing Machinery (ACM)
Reference43 articles.
1. A new generative adversarial network for medical images super resolution
2. VCD
3. Wenbo Bao, Wei-Sheng Lai, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang. 2019. Memc-net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE transactions on pattern analysis and machine intelligence 43, 3(2019), 933–948.
4. Daniel J Butler et al. 2012. A naturalistic open source movie for optical flow evaluation. In ECCV. Springer, 611–625.
5. BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond