Method for Adaptation of Algorithms to GPU Architecture-Reference-Cited by-同舟云学术

Method for Adaptation of Algorithms to GPU Architecture

Published:2021 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 31th International Conference on Computer Graphics and Vision. Volume 2
language:
Short-container-title:

Author:

Bulavintsev Vadim¹^ORCID,Zhdanov Dmitry¹^ORCID

Affiliation:

1. ITMO University

Abstract

We propose a generalized method for adapting and optimizing algorithms for efficient execution on modern graphics processing units (GPU). The method consists of several steps. First, build a control flow graph (CFG) of the algorithm. Next, transform the CFG into a tree of loops and merge non-parallelizable loops into parallelizable ones. Finally, map the resulting loops tree to the tree of GPU computational units, unrolling the algorithm’s loops as necessary for the match. The mapping should be performed bottom-up, from the lowest GPU architecture levels to the highest ones, to minimize off-chip memory access and maximize register file usage. The method provides programmer with a convenient and robust mental framework and strategy for GPU code optimization. We demonstrate the method by adapting to a GPU the DPLL backtracking search algorithm for solving the Boolean satisfiability problem (SAT). The resulting GPU version of DPLL outperforms the CPU version in raw tree search performance sixfold for regular Boolean satisfiability problems and twofold for irregular ones.

Publisher

Keldysh Institute of Applied Mathematics

Reference24 articles.

1. M. J. Flynn, Some Computer Organizations and Their Effectiveness, IEEE Transactions on Computers C-21 (1972) 948–960. URL: http://ieeexplore.ieee.org/document/5009071/. doi:10.1109/TC.1972.5009071.

2. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng, TensorFlow: A System for LargeScale Machine Learning, in: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), USENIX Association, Savannah, GA, 2016, pp. 265–283. URL: https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.

3. NVIDIA, P. Vingelmann, F. H. Fitzek, CUDA, release: 10.2.89, 2020. URL: https://developer.nvidia.com/cuda-toolkit.

4. J. E. Stone, D. Gohara, G. Shi, OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems, Computing in Science Engineering 12 (2010) 66–73. doi:10.1109/MCSE.2010.69.

5. R. Dolbeau, F. Bodin, G. C. de Verdiere, One OpenCL to rule them all?, in: 2013 IEEE 6th International Workshop on Multi-/Many-core Computing Systems (MuCoCoS), IEEE, 2013, pp. 1–6.