Affiliation:
1. University of Electronic Science and Technology of China, China
Abstract
Super-resolution (SR) based on deep learning has obtained superior performance in image reconstruction. Recently, various algorithm efforts have been committed to improving image reconstruction quality and speed. However, the inference of SR contains huge amounts of computation and data access, leading to low hardware implementation efficiency. For instance, the up-sampling with the deconvolution process requires considerable computation resources. In addition, the sizes of output feature maps of several middle layers are extraordinarily large, which is challenging to optimize, causing serious data access issues. In this work, we present an all-on-chip hardware architecture based on the deconvolution scheme and feature map segmentation strategy, namely ADAS, where all the generated data by the middle layers are buffered on-chip to avoid large data movements between on- and off-chip. In ADAS, we develop a hardware-friendly and efficient deconvolution scheme to accelerate the computation. Also, the dynamic reconfigurable process element (PE) combined with efficient mapping is proposed to enhance PE utilization up to nearly
\(100\% \)
and support multiple scaling factors. Based on our experimental results, ADAS demonstrates real-time image SR and better image reconstruction quality with PSNR (37.15
dB
) and SSIM (0.9587). Compared to baseline and validated with the FPGA platform, ADAS can support scaling factors of 2, 3, and 4, achieving 2.68 ×, 5.02 × and 8.28 × speedup.
Publisher
Association for Computing Machinery (ACM)
Reference46 articles.
1. Fused-layer CNN accelerators
2. Real-time HDTV to 4K and 8K-UHD conversions using anti-aliasing based super resolution algorithm on FPGA
3. Densely residual laplacian super-resolution;Anwar Saeed;IEEE Transactions on Pattern Analysis and Machine Intelligence,2020
4. Alexey Bochkovskiy , Chien-Yao Wang , and Hong- Yuan Mark Liao . 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 x, x ( 2020 ). Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 x, x (2020).
5. Investigating Tradeoffs in Real-World Video Super-Resolution
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献