Fraction Execution Resolver Using a Hybrid Multi-CPU/GPU Encoding Scheme

Author:

Papaioannou Georgios I.1ORCID,Koziri Maria2ORCID,Loukopoulos Thanasis1,Anagnostopoulos Ioannis1ORCID

Affiliation:

1. Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131 Lamia, Greece

2. Department of Informatics and Telecommunications, University of Thessaly, 35131 Lamia, Greece

Abstract

Modern video coding standards make use of sub-pixel motion estimation to improve the video quality and reduce the bitrate. It is known that the fraction motion estimation (FME) part follows the integer motion estimation (IME) and adds an extra computational overhead due to the interpolation and the additional motion searches. In this paper, we propose a fraction execution resolver (FER) algorithm that lets the encoder skip the fraction part when specific criteria are met by introducing a preliminary fast test decision point (pFTDP) function for the IME part. If the pFTDP returns zero motion vectors (MVs) and the displacement search area center is also zero, then the fraction part is skipped. The pFTDP decision maker is executed only once, when a 2N × 2N block is first met, while all subsequent blocks follow this initial decision either by receiving the necessary MVs and RD from the pFTDP function or by using the precalculated IME values from the GPU kernel. For our experiments, we use a multithreaded CPU environment that also makes use of GPUs only for the integer part. Our evaluations provide a greater than 1600% encoding time saving at its peak in comparison with the default HEVC sequential mode and ideally a saving of greater than 2286% for still video frame sequences. The total average speedup for both Class A and Class B video sequences is ×13.45. The gain of the FER itself is more than ×3.9 compared with the same multithreaded setup environment. The PSNR and bitrate overhead observed are proportional to the tiling scheme used and are more related to the way CABAC works internally. The FER’s negative effects on coding efficiency are proven to be negligible. A balance between speed and quality achieved by using a lower tiling pattern is shown to minimize the negative effects of the encoding scheme pattern. The experimental results confirm the validity of our motivation, namely, that we can benefit from a software fraction execution resolver without any extra hardware costs. The gain is further increased when video sequences have more static blocks than others.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Reference44 articles.

1. Overview of the High Efficiency Video Coding (HEVC) standard;Sullivan;IEEE Trans. Circuits Syst. Video Technol.,2012

2. Atapattu, S., Liyanage, N., Menuka, N., Perera, I., and Pasqual, A. (2016, January 6–8). Real time all intra HEVC HD encoder on FPGA. Proceedings of the IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), London, UK.

3. High-Level Synthesis Implementation of an Embedded Real-Time HEVC Intra Encoder on FPGA for Media Applications;Lemmetti;ACM Trans. Des. Autom. Electron. Syst.,2022

4. Ma, E., Zhao, Z., and Qi, H. (2023, January 21–25). Hardware-friendly Integer Motion Estimation with Weighted Search For AVS3. Proceedings of the 2023 IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, CA, USA.

5. Hardware implementation and validation of the fast variable block size motion estimation architecture for HEVC Standard;Loukil;Multimed. Tools Appl.,2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3