NUMA-aware memory manager with dominant-thread-based copying GC

Author:

Ogasawara Takeshi1

Affiliation:

1. IBM Research - Tokyo, Yamato, Japan

Abstract

We propose a novel online method of identifying the preferred NUMA nodes for objects with negligible overhead during the garbage collection time as well as object allocation time. Since the number of CPUs (or NUMA nodes) is increasing recently, it is critical for the memory manager of the runtime environment of an object-oriented language to exploit the low latency of local memory for high performance. To locate the CPU of a thread that frequently accesses an object, prior research uses the runtime information about memory accesses as sampled by the hardware. However, the overhead of this approach is high for a garbage collector. Our approach uses the information about which thread can exclusively access an object, or the Dominant Thread (DoT). The dominant thread of an object is the thread that often most accesses an object so that we do not require memory access samples. Our NUMA-aware GC performs DoT based object copying, which copies each live object to the CPU where the dominant thread was last dispatched before GC. The dominant thread information is known from the thread stack and from objects that are locked or reserved by threads and is propagated in the object reference graph. We demonstrate that our approach can improve the performance of benchmark programs such as SPECpower ssj2008, SPECjbb2005, and SPECjvm2008.We prototyped a NUMAaware memory manager on a modified version of IBM Java VM and tested it on a cc-NUMA POWER6 machine with eight NUMA nodes. Our NUMA-aware GC achieved performance improvements up to 14.3% and 2.0% on average over a JVM that only used the NUMA-aware allocator. The total improvement using both the NUMA-aware allocator and GC is up to 53.1% and 10.8% on average.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Reference37 articles.

1. Advanced Micro Devices Inc. Performance guidelines for AMD AthlonTM64 and AMD OpteronTMccNUMA multiprocessor systems 2006. Advanced Micro Devices Inc. Performance guidelines for AMD AthlonTM64 and AMD OpteronTMccNUMA multiprocessor systems 2006.

2. Thin locks

3. BEA Systems Inc. BEA JRockit r27.5 commandline reference. http://e-docs.bea.com/jrockit/jrdocs/pdf/refman.pdf January 2008. BEA Systems Inc. BEA JRockit r27.5 commandline reference. http://e-docs.bea.com/jrockit/jrdocs/pdf/refman.pdf January 2008.

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Virtual prototyping of complex optical systems on multiprocessor workstations;Optical Engineering;2022-12-07

2. Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures;ACM Transactions on Architecture and Code Optimization;2016-09-17

3. A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures;Euro-Par 2016: Parallel Processing;2016

4. Topology-Aware Parallelism for NUMA Copying Collectors;Languages and Compilers for Parallel Computing;2016

5. SmartStealing;Proceedings of the Principles and Practices of Programming on The Java Platform;2015-09-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3