Scalable concurrent and parallel mark-Reference-Cited by-同舟云学术

Scalable concurrent and parallel mark

Published:2013-01-08 Issue:11 Volume:47 Page:61-72
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Iyengar Balaji¹,Gehringer Edward²,Wolf Michael³,Manivannan Karthikeyan³

Affiliation:

1. Azul Systems Inc, Sunnyvale, CA, USA

2. North Carolina State University, Raleigh, NC, USA

3. Azul Systems Inc., Sunnyvale, CA, USA

Abstract

Parallel marking algorithms use multiple threads to walk through the object heap graph and mark each reachable object as live. Parallel marker threads mark an object "live" by atomically setting a bit in a mark-bitmap or a bit in the object header. Most of these parallel algorithms strive to improve the marking throughput by using work-stealing algorithms for load-balancing and to ensure that all participating threads are kept busy. A purely "processor-centric" load-balancing approach in conjunction with a need to atomically set the mark bit, results in significant contention during parallel marking. This limits the scalability and throughput of parallel marking algorithms. We describe a new non-blocking and lock-free, work-sharing algorithm, the primary goal being to reduce contention during atomic updates of the mark-bitmap by parallel task-threads. Our work-sharing mechanism uses the address of a word in the mark-bitmap as the key to stripe work among parallel task-threads, with only a subset of the task-threads working on each stripe. This filters out most of the contention during parallel marking with 20% improvements in performance. In case of concurrent and on-the-fly collector algorithms, mutator threads also generate marking-work for the marking task-threads. In these schemes, mutator threads are also provided with thread-local marking stacks where they collect references to potentially "gray" objects, i.e., objects that haven't been "marked-through" by the collector. We note that since this work is generated by mutators when they reference these objects, there is a high likelihood that these objects continue to be present in the processor cache. We describe and evaluate a scheme to distribute mutator generated marking work among the collector's task-threads that is cognizant of the processor and cache topology. We prototype both our algorithms within the C4 [28] collector that ships as part of an industrial strength JVM for the Linux-X86 platform.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2426642.2259006

Reference30 articles.

1. Intel® 64 and ia-32 architectures developer's manual: Combined volumes . URL http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf. Intel® 64 and ia-32 architectures developer's manual: Combined volumes . URL http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

2. Intel® 64 architecture processor topology enumeration . URL http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/. Intel® 64 architecture processor topology enumeration . URL http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/.

3. Standard performance evaluation corporation. spec jvm98. URL http://www.spec.org/jvm98/. Standard performance evaluation corporation. spec jvm98. URL http://www.spec.org/jvm98/.

4. The volano benchmark. URL http://www.volano.com/benchmarks.html. The volano benchmark. URL http://www.volano.com/benchmarks.html.

5. Thread scheduling for multiprogrammed multiprocessors

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Understanding and improving JVM GC work stealing at the data center scale;ACM SIGPLAN Notices;2018-07-19

2. Understanding and improving JVM GC work stealing at the data center scale;Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management;2016-06-14

3. SmartStealing;Proceedings of the Principles and Practices of Programming on The Java Platform;2015-09-08

4. Evaluating HTM for Pauseless Garbage Collectors in Java;2015 IEEE Trustcom/BigDataSE/ISPA;2015-08

5. Mark without much Sweep Algorithm for Garbage Collection;Automatika;2014-01