Scalable concurrent and parallel mark

Author:

Iyengar Balaji1,Gehringer Edward2,Wolf Michael3,Manivannan Karthikeyan3

Affiliation:

1. Azul Systems Inc, Sunnyvale, CA, USA

2. North Carolina State University, Raleigh, NC, USA

3. Azul Systems Inc., Sunnyvale, CA, USA

Abstract

Parallel marking algorithms use multiple threads to walk through the object heap graph and mark each reachable object as live. Parallel marker threads mark an object "live" by atomically setting a bit in a mark-bitmap or a bit in the object header. Most of these parallel algorithms strive to improve the marking throughput by using work-stealing algorithms for load-balancing and to ensure that all participating threads are kept busy. A purely "processor-centric" load-balancing approach in conjunction with a need to atomically set the mark bit, results in significant contention during parallel marking. This limits the scalability and throughput of parallel marking algorithms. We describe a new non-blocking and lock-free, work-sharing algorithm, the primary goal being to reduce contention during atomic updates of the mark-bitmap by parallel task-threads. Our work-sharing mechanism uses the address of a word in the mark-bitmap as the key to stripe work among parallel task-threads, with only a subset of the task-threads working on each stripe. This filters out most of the contention during parallel marking with 20% improvements in performance. In case of concurrent and on-the-fly collector algorithms, mutator threads also generate marking-work for the marking task-threads. In these schemes, mutator threads are also provided with thread-local marking stacks where they collect references to potentially "gray" objects, i.e., objects that haven't been "marked-through" by the collector. We note that since this work is generated by mutators when they reference these objects, there is a high likelihood that these objects continue to be present in the processor cache. We describe and evaluate a scheme to distribute mutator generated marking work among the collector's task-threads that is cognizant of the processor and cache topology. We prototype both our algorithms within the C4 [28] collector that ships as part of an industrial strength JVM for the Linux-X86 platform.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Reference30 articles.

1. Intel® 64 and ia-32 architectures developer's manual: Combined volumes . URL http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf. Intel® 64 and ia-32 architectures developer's manual: Combined volumes . URL http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

2. Intel® 64 architecture processor topology enumeration . URL http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/. Intel® 64 architecture processor topology enumeration . URL http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/.

3. Standard performance evaluation corporation. spec jvm98. URL http://www.spec.org/jvm98/. Standard performance evaluation corporation. spec jvm98. URL http://www.spec.org/jvm98/.

4. The volano benchmark. URL http://www.volano.com/benchmarks.html. The volano benchmark. URL http://www.volano.com/benchmarks.html.

5. Thread scheduling for multiprogrammed multiprocessors

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Understanding and improving JVM GC work stealing at the data center scale;ACM SIGPLAN Notices;2018-07-19

2. Understanding and improving JVM GC work stealing at the data center scale;Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management;2016-06-14

3. SmartStealing;Proceedings of the Principles and Practices of Programming on The Java Platform;2015-09-08

4. Evaluating HTM for Pauseless Garbage Collectors in Java;2015 IEEE Trustcom/BigDataSE/ISPA;2015-08

5. Mark without much Sweep Algorithm for Garbage Collection;Automatika;2014-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3