NUMA-aware memory manager with dominant-thread-based copying GC-Reference-Cited by-同舟云学术

NUMA-aware memory manager with dominant-thread-based copying GC

Published:2009-10-25 Issue:10 Volume:44 Page:377-390
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Ogasawara Takeshi¹

Affiliation:

1. IBM Research - Tokyo, Yamato, Japan

Abstract

We propose a novel online method of identifying the preferred NUMA nodes for objects with negligible overhead during the garbage collection time as well as object allocation time. Since the number of CPUs (or NUMA nodes) is increasing recently, it is critical for the memory manager of the runtime environment of an object-oriented language to exploit the low latency of local memory for high performance. To locate the CPU of a thread that frequently accesses an object, prior research uses the runtime information about memory accesses as sampled by the hardware. However, the overhead of this approach is high for a garbage collector. Our approach uses the information about which thread can exclusively access an object, or the Dominant Thread (DoT). The dominant thread of an object is the thread that often most accesses an object so that we do not require memory access samples. Our NUMA-aware GC performs DoT based object copying, which copies each live object to the CPU where the dominant thread was last dispatched before GC. The dominant thread information is known from the thread stack and from objects that are locked or reserved by threads and is propagated in the object reference graph. We demonstrate that our approach can improve the performance of benchmark programs such as SPECpower ssj2008, SPECjbb2005, and SPECjvm2008.We prototyped a NUMAaware memory manager on a modified version of IBM Java VM and tested it on a cc-NUMA POWER6 machine with eight NUMA nodes. Our NUMA-aware GC achieved performance improvements up to 14.3% and 2.0% on average over a JVM that only used the NUMA-aware allocator. The total improvement using both the NUMA-aware allocator and GC is up to 53.1% and 10.8% on average.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/1639949.1640117

Reference37 articles.

1. Advanced Micro Devices Inc. Performance guidelines for AMD AthlonTM64 and AMD OpteronTMccNUMA multiprocessor systems 2006. Advanced Micro Devices Inc. Performance guidelines for AMD AthlonTM64 and AMD OpteronTMccNUMA multiprocessor systems 2006.

2. Thin locks

3. BEA Systems Inc. BEA JRockit r27.5 commandline reference. http://e-docs.bea.com/jrockit/jrdocs/pdf/refman.pdf January 2008. BEA Systems Inc. BEA JRockit r27.5 commandline reference. http://e-docs.bea.com/jrockit/jrdocs/pdf/refman.pdf January 2008.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Virtual prototyping of complex optical systems on multiprocessor workstations;Optical Engineering;2022-12-07

2. Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures;ACM Transactions on Architecture and Code Optimization;2016-09-17

3. A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures;Euro-Par 2016: Parallel Processing;2016

4. Topology-Aware Parallelism for NUMA Copying Collectors;Languages and Compilers for Parallel Computing;2016

5. SmartStealing;Proceedings of the Principles and Practices of Programming on The Java Platform;2015-09-08