Affiliation:
1. IBM Research - Tokyo, Yamato, Japan
Abstract
We propose a novel online method of identifying the preferred NUMA nodes for objects with negligible overhead during the garbage collection time as well as object allocation time. Since the number of CPUs (or NUMA nodes) is increasing recently, it is critical for the memory manager of the runtime environment of an object-oriented language to exploit the low latency of local memory for high performance. To locate the CPU of a thread that frequently accesses an object, prior research uses the runtime information about memory accesses as sampled by the hardware. However, the overhead of this approach is high for a garbage collector.
Our approach uses the information about which thread can exclusively access an object, or the
Dominant Thread
(DoT). The dominant thread of an object is the thread that often most accesses an object so that we do not require memory access samples. Our NUMA-aware GC performs DoT based object copying, which copies each live object to the CPU where the dominant thread was last dispatched before GC. The dominant thread information is known from the thread stack and from objects that are locked or reserved by threads and is propagated in the object reference graph.
We demonstrate that our approach can improve the performance of benchmark programs such as SPECpower ssj2008, SPECjbb2005, and SPECjvm2008.We prototyped a NUMAaware memory manager on a modified version of IBM Java VM and tested it on a cc-NUMA POWER6 machine with eight NUMA nodes. Our NUMA-aware GC achieved performance improvements up to 14.3% and 2.0% on average over a JVM that only used the NUMA-aware allocator. The total improvement using both the NUMA-aware allocator and GC is up to 53.1% and 10.8% on average.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference37 articles.
1. Advanced Micro Devices Inc. Performance guidelines for AMD AthlonTM64 and AMD OpteronTMccNUMA multiprocessor systems 2006. Advanced Micro Devices Inc. Performance guidelines for AMD AthlonTM64 and AMD OpteronTMccNUMA multiprocessor systems 2006.
2. Thin locks
3. BEA Systems Inc. BEA JRockit r27.5 commandline reference. http://e-docs.bea.com/jrockit/jrdocs/pdf/refman.pdf January 2008. BEA Systems Inc. BEA JRockit r27.5 commandline reference. http://e-docs.bea.com/jrockit/jrdocs/pdf/refman.pdf January 2008.
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献