A NUMA-Aware Version of an Adaptive Self-Scheduling Loop Scheduler

Author:

Booth Joshua Dennis1ORCID,Lane Phillip1ORCID

Affiliation:

1. Computer Science, The University of Alabama in Huntsville, Huntsville, United States

Abstract

Parallelizing code in a shared-memory environment is commonly done utilizing loop scheduling (LS) in a fork-join manner as in OpenMP. This manner of parallelization is popular due to its ease to code, but the choice of the LS method is important when the workload per iteration is highly variable. Currently, the shared-memory environment is evolving in high-performance computing as larger chiplet based processors with high core counts and segmented L3 cache are introduced. These processors have a stronger nonuniform memory access (NUMA) effect than the previous generation of x86-64 processors. This work attempts to modify the adaptive self-scheduling loop scheduler known as iCh ( i rregular Ch unk) for these NUMA environments while analyzing the impact of these systems on default OpenMP LS methods. In particular, iCh is as a default LS method for irregular applications (i.e., applications where the workload per iteration is highly variable) that guarantees “good” performance without tuning. The modified version, named NiCh , is demonstrated over multiple irregular applications to show the variation in performance. The work demonstrates that NiCh is able to better handle architectures with stronger NUMA effects, and in particular is better than iCh when the number of threads is greater than the number of cores. However, NiCh also comes with being less universally “good” as iCh and a set of parameters that are hardware dependent.

Publisher

Association for Computing Machinery (ACM)

Reference25 articles.

1. Legion: Expressing locality and independence with logical regions

2. Timothy J Boerner, Stephen Deems, Thomas R Furlani, Shelley L Knuth, and John Towns. 2023. Access: Advancing innovation: NSF’s advanced cyberinfrastructure coordination ecosystem: Services & support. In Practice and Experience in Advanced Research Computing. 173–176.

3. An adaptive self‐scheduling loop scheduler

4. Rodinia: A benchmark suite for heterogeneous computing

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3