Author:
Khemiri Randa,Bouaafia Soulef,Bahba Asma,Nasr Maha,Ezahra Sayadi Fatma
Abstract
In Motion estimation (ME), the block matching algorithms have a great potential of parallelism. This process of the best match is performed by computing the similarity for each block position inside the search area, using a similarity metric, such as Sum of Absolute Differences (SAD). It is used in the various steps of motion estimation algorithms. Moreover, it can be parallelized using Graphics Processing Unit (GPU) since the computation algorithm of each block pixels is similar, thus offering better results. In this work a fixed OpenCL code was performed firstly on several architectures as CPU and GPU, secondly a parallel GPU-implementation was proposed with CUDA and OpenCL for the SAD process using block of sizes from 4x4 to 64x64. A comparative study established between execution time on GPU on the same video sequence. The experimental results indicated that GPU OpenCL execution time was better than that of CUDA times with performance ratio that reached the double.
Reference22 articles.
1. Osama, M., Wijs, A.: Parallel SAT Simplification on GPU Architectures. In: Vojnar T., Zhang L. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS. Lecture Notes in Computer Science, 114(27) (2019)
2. Yang, X., Jian, L., Wu, W. et al. J Real-Time Image Proc, https://doi.org/10.1007/s11554-018-0803-y, 2019
3. Karimi, K., Dickson, N. G., Hamze, F.: A Performance Comparison of CUDA and OpenCL (2010)
4. Tsuchiyama, R., Nakamura, T., Iizuka, T., Asahara, A.: The OpenCL Programming Book. Fixstars Corporation (2010)
5. Richardson, I.: ‘HEVC an introduction to high efficiency video coding’, VCodexVideo Compression, http://vcodex.com/, accessed 15 January 2016