Affiliation:
1. MIREA – Russian Technological University
Abstract
General Purpose computing for Graphical Processing Units (GPGPU) technology is a powerful tool for offloading parallel data processing tasks to Graphical Processing Units (GPUs). This technology finds its use in variety of domains – from science and commerce to hobbyists. GPU-run general-purpose programs will inevitably run into performance issues stemming from code branch predication. Code predication is a GPU feature that makes both conditional branches execute, masking the results of incorrect branch. This leads to considerable performance losses for GPU programs that have large amounts of code hidden away behind conditional operators. This paper focuses on the analysis of existing approaches to improving software performance in the context of relieving the aforementioned performance loss. Description of said approaches is provided, along with their upsides, downsides and extents of their applicability and whether they address the outlined problem. Covered approaches include: optimizing compilers, JIT-compilation, branch predictor, speculative execution, adaptive optimization, run-time algorithm specialization, profile-guided optimization. It is shown that the aforementioned methods are mostly catered to CPU-specific issues and are generally not applicable, as far as branch-predication performance loss is concerned. Lastly, we outline the need for a separate performance improving approach, addressing specifics of branch predication and GPGPU workflow.
Reference25 articles.
1. Markidis S., Chien S.W.D., Laure E., Peng I.B., Vetter J.S. NVIDIA Tensor Core Programmability, Performance & Precision. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Vancouver, BC, Canada; 2018, p. 522−531. https://doi.org/10.1109/IPDPSW.2018.00091
2. Sanzharov V.V., Gorbonosov A.I., Frolov V.A., Voloboy A.G. Examination of the Nvidia RTX. CEUR Workshop Proceedings. 2019;2485:7−12. http://dx.doi.org/10.30987/graphicon-2019-2-7-12
3. Flynn M.J. Very high speed computing systems. Proceedings of the IEEE. 1966;54(12):1901−1909. https://doi.org/10.1109/PROC.1966.5273
4. Fisher J.A., Faraboschi P., Young C. Embedded computing: A VLIW approach to architecture, compilers, and tools. Elsevier; 2004. ISBN: 978-1-55860-766-8. URL: https://www.researchgate.net/publication/220690439_Embedded_computing_a_VLIW_approach_to_architecture_compilers_and_tools
5. Knoop J., Rüthing O., Steffen B. Partial dead code elimination. In: Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation (PLDI ‘94). 1994, p. 147−158. https://doi.org/10.1145/178243.178256
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献