A Library for Portable and Composable Data Locality Optimizations for NUMA Systems-Reference-Cited by-同舟云学术

A Library for Portable and Composable Data Locality Optimizations for NUMA Systems

Published:2017-03-23 Issue:4 Volume:3 Page:1-32
ISSN:2329-4949
Container-title:ACM Transactions on Parallel Computing
language:en
Short-container-title:ACM Trans. Parallel Comput.

Author:

Majo Zoltan¹,Gross Thomas R.¹

Affiliation:

1. ETH Zurich, Switzerland

Abstract

Many recent multiprocessor systems are realized with a nonuniform memory architecture (NUMA) and accesses to remote memory locations take more time than local memory accesses. Optimizing NUMA memory system performance is difficult and costly for three principal reasons: (1) Today’s programming languages/libraries have no explicit support for NUMA systems, (2) NUMA optimizations are not portable, and (3) optimizations are not composable (i.e., they can become ineffective or worsen performance in environments that support composable parallel software). This article presents TBB-NUMA, a parallel programming library based on Intel Threading Building Blocks (TBB) that supports portable and composable NUMA-aware programming. TBB-NUMA provides a model of task affinity that captures a programmer’s insights on mapping tasks to resources. NUMA-awareness affects all layers of the library (i.e., resource management, task scheduling, and high-level parallel algorithm templates) and requires close coupling between all these layers. Optimizations implemented with TBB-NUMA (for a set of standard benchmark programs) result in up to 44% performance improvement over standard TBB. But more important, optimized programs are portable across different NUMA architectures and preserve data locality also when composed with other parallel computations sharing the same resource management layer.

Funder

SNF

Publisher

Association for Computing Machinery (ACM)

Subject

Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modeling and Simulation,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3040222

Reference39 articles.

1. The data locality of work stealing

2. Efficiently combining parallel software using fine-grained, language-level, hierarchical resource management policies

3. Simple but effective techniques for NUMA memory management

4. CATS

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. WASP: Workload-Aware Self-Replicating Page-Tables for NUMA Servers;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2;2024-04-27

2. Online Thread and Data Mapping Using a Sharing-Aware Memory Management Unit;ACM Transactions on Modeling and Performance Evaluation of Computing Systems;2020-12-31

3. Bandwidth-Aware Page Placement in NUMA;2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2020-05

4. Mozart : Efficient Composition of Library Functions for Heterogeneous Execution;Languages and Compilers for Parallel Computing;2019

5. Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree;Computers;2018-12-03