Comparative analysis of software optimization methods in context of branch predication on GPUs-Reference-Cited by-同舟云学术

Comparative analysis of software optimization methods in context of branch predication on GPUs

Published:2021-12-02 Issue:6 Volume:9 Page:7-15
ISSN:2500-316X
Container-title:Russian Technological Journal
language:
Short-container-title:Rossijskij tehnologičeskij žurnal

Author:

Sesin I. Yu.¹^ORCID,Bolbakov R. G.¹^ORCID

Affiliation:

1. MIREA – Russian Technological University

Abstract

General Purpose computing for Graphical Processing Units (GPGPU) technology is a powerful tool for offloading parallel data processing tasks to Graphical Processing Units (GPUs). This technology finds its use in variety of domains – from science and commerce to hobbyists. GPU-run general-purpose programs will inevitably run into performance issues stemming from code branch predication. Code predication is a GPU feature that makes both conditional branches execute, masking the results of incorrect branch. This leads to considerable performance losses for GPU programs that have large amounts of code hidden away behind conditional operators. This paper focuses on the analysis of existing approaches to improving software performance in the context of relieving the aforementioned performance loss. Description of said approaches is provided, along with their upsides, downsides and extents of their applicability and whether they address the outlined problem. Covered approaches include: optimizing compilers, JIT-compilation, branch predictor, speculative execution, adaptive optimization, run-time algorithm specialization, profile-guided optimization. It is shown that the aforementioned methods are mostly catered to CPU-specific issues and are generally not applicable, as far as branch-predication performance loss is concerned. Lastly, we outline the need for a separate performance improving approach, addressing specifics of branch predication and GPGPU workflow.

Publisher

RTU MIREA

Reference25 articles.

1. Markidis S., Chien S.W.D., Laure E., Peng I.B., Vetter J.S. NVIDIA Tensor Core Programmability, Performance & Precision. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Vancouver, BC, Canada; 2018, p. 522−531. https://doi.org/10.1109/IPDPSW.2018.00091

2. Sanzharov V.V., Gorbonosov A.I., Frolov V.A., Voloboy A.G. Examination of the Nvidia RTX. CEUR Workshop Proceedings. 2019;2485:7−12. http://dx.doi.org/10.30987/graphicon-2019-2-7-12

3. Flynn M.J. Very high speed computing systems. Proceedings of the IEEE. 1966;54(12):1901−1909. https://doi.org/10.1109/PROC.1966.5273

4. Fisher J.A., Faraboschi P., Young C. Embedded computing: A VLIW approach to architecture, compilers, and tools. Elsevier; 2004. ISBN: 978-1-55860-766-8. URL: https://www.researchgate.net/publication/220690439_Embedded_computing_a_VLIW_approach_to_architecture_compilers_and_tools

5. Knoop J., Rüthing O., Steffen B. Partial dead code elimination. In: Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation (PLDI ‘94). 1994, p. 147−158. https://doi.org/10.1145/178243.178256

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identification of digital device hardware vulnerabilities based on scanning systems and semi-natural modeling;Russian Technological Journal;2024-08-05

2. Method for designing specialized computing systems based on hardware and software co-optimization;Russian Technological Journal;2024-05-31

3. Automatic Classification of Liquid Crystal Images Based on Topological Analysis;IEEE Sensors Journal;2023-01-15

4. Feasibility Issues of Complex Information Systems;2022 4th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA);2022-11-09

5. Prospects for using soft processors in systems-on-a-chip based on field-programmable gate arrays;Russian Technological Journal;2022-06-08