Conditions for the existence of broadcast and spatial locality in computation threads

Author:

Likhoded N. A.1

Affiliation:

1. Belarusian State University

Abstract

Graphics Processing Units (GPUs) are considered as the target computer for implementing parallel algorithms. The set of algorithm operations to be implemented on the GPU must be split into computation threads; the threads should be grouped into computation blocks that are performed atomically on stream processors. Threads of a single block are executed on a stream processor in parts-pools called warp; warp threads are executed simultaneously. The efficiency of the parallel algorithm depends on the way the data is stored in the GPU memory. If all warp threads request the same datum when executing the current operator, then it is desirable to place it in a shared or constant GPU memory; in this case, its distribution across the cores of the multiprocessor is actually realized by means of broadcast. If warp threads request data located close to the memory, then in this case there is a spatial locality of data, which makes it advisable to place this data in the GPU’s memory. The implementation of broadcast or spatial locality by placing data in a memory of the appropriate type allows one to significantly reduce traffic when exchanging data between the memory levels of the GPU. This paper formulates and proves the necessary and sufficient conditions under which it is possible to perform a broadcast or there is a spatial locality of data. The conditions are formulated in terms of functions that determine the use of array elements at occurrences in the algorithm operators and functions that define the information dependencies of the algorithm. The results of the work can be used to optimize parallel algorithms when they are implemented on the GPU.

Publisher

Publishing House Belorusskaya Nauka

Subject

Computational Theory and Mathematics,General Physics and Astronomy,General Mathematics

Reference10 articles.

1. Likhoded N. A. Characterization of locality of the parallel implementations of imperfectly nested loops. Doklady Natsional’noi akademii nauk Belarus = Proceedings of the National Academy of Sciences of Belarus, 2010, vol. 54, no. 1, pp. 26–32 (in Russian).

2. Adutskevich N. A. Likhoded N. A., Sikorsky A. O. Parallelization of sequential programs: distribution of arrays among processors and structurization of communications. Cybernetics and System Analysis, 2012, vol. 48, no. 1, pp. 122–137. https:// doi.org/10.1007/s10559-012-9382-2

3. Likhoded N. A., Paliashchuk M. A. Method of ranking tiles size parameters of parallel algorithm. Doklady Natsional’noi akademii nauk Belarus = Proceedings of the National Academy of Sciences of Belarus, 2015, vol. 59, no. 4, pp. 25–33 (in Russian).

4. Likhoded N. A., Paliashchuk M. A. Estimate of locality of parallel algorithms implemented on GPUs. Vestnik YuzhnoUral’skogo gosudarstvennogo universiteta. Seriya: «Vychislitel’naya matematika i informatika» = Bulletin of the South Ural State University. Series: “Computational Mathematics and Software Engineering”, 2016, vol. 5, no. 3, pp. 96–111 (in Russian). https://doi.org/10.14529/cmse160307

5. Likhoded N. A., Paliashchuk M. A. Conditions for privatizing the elements of arrays by computing threads. Zhurnal Belorusskogo gosudarstvennogo universiteta. Matematika. Informatika = Journal of the Belarusian State University. Mathematics and Informatics, 2018, no. 3, pp. 59–67 (in Russian).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3