An OpenMP Runtime for Transparent Work Sharing across Cache-Incoherent Heterogeneous Nodes-Reference-Cited by-同舟云学术

An OpenMP Runtime for Transparent Work Sharing across Cache-Incoherent Heterogeneous Nodes

Published:2021-11-30 Issue:1-4 Volume:39 Page:1-30
ISSN:0734-2071
Container-title:ACM Transactions on Computer Systems
language:en
Short-container-title:ACM Trans. Comput. Syst.

Author:

Lyerly Robert¹,Bilbao Carlos¹^ORCID,Min Changwoo¹,Rossbach Christopher J.²,Ravindran Binoy¹

Affiliation:

1. Virginia Tech

2. University of Texas at Austin and VMware Research

Abstract

In this work, we present libHetMP , an OpenMP runtime for automatically and transparently distributing parallel computation across heterogeneous nodes. libHetMP targets platforms comprising CPUs with different instruction set architectures (ISA) coupled by a high-speed memory interconnect, where cross-ISA binary incompatibility and non-coherent caches require application data be marshaled to be shared across CPUs. Because of this, work distribution decisions must take into account both relative compute performance of asymmetric CPUs and communication overheads. libHetMP drives workload distribution decisions without programmer intervention by measuring performance characteristics during cross-node execution. A novel HetProbe loop iteration scheduler decides if cross-node execution is beneficial and either distributes work according to the relative performance of CPUs when it is or places all work on the set of homogeneous CPUs providing the best performance when it is not. We evaluate libHetMP using compute kernels from several OpenMP benchmark suites and show a geometric mean 41% speedup in execution time across asymmetric CPUs. Because some workloads may showcase irregular behavior among iterations, we extend libHetMP with a second scheduler called HetProbe-I. The evaluation of HetProbe-I shows it can further improve speedup for irregular computation, in some cases up to a 24%, by triggering periodic distribution decisions.

Funder

US Office of Naval Research

NAVSEA/NEEC

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3505224

Reference56 articles.

1. 2017. PCI Express Base Specification Revision 4.0 Version 1.0. Retrieved from https://pcisig.com/specifications/pciexpress/.

2. 2018. Summit: A Supercomputer Suited for AI. Retrieved from https://www.olcf.ornl.gov/wp-content/uploads/2018/06/NODE_infographic_FIN.pdf.

3. AMD. 2020. AMD Infinity Architecture Technology. Retrieved from https://www.amd.com/en/technologies/infinity-architecture.

4. TreadMarks: shared memory computing on networks of workstations

5. Anandtech. 2019. Intel Agilex: 10nm FPGAs with PCIe 5.0 DDR5 and CXL. Retrieved from https://www.anandtech.com/show/14149/intel-agilex-10nm-fpgas-with-pcie-50-ddr5-and-cxl.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Flexible system software scheduling for asymmetric multicore systems with PMCSched: A case for Intel Alder Lake;Concurrency and Computation: Practice and Experience;2023-06-06

2. An Empirical View on Consolidation of the Web;ACM Transactions on Internet Technology;2022-02-12