Lock Contention Management in Multithreaded MPI

Author:

Amer Abdelhalim1,Lu Huiwei1,Balaji Pavan1,Chabbi Milind2,Wei Yanjie3,Hammond Jeff4,Matsuoka Satoshi5

Affiliation:

1. Argonne National Laboratory, USA

2. Hewlett-Packard Labs, USA

3. Shenzhen Institute of Advanced Technologies, Chinese Academy of Sciences, China

4. Intel, USA

5. Tokyo Institute of Technology, Japan

Abstract

In this article, we investigate contention management in lock-based thread-safe MPI libraries. Specifically, we make two assumptions: (1) locks are the only form of synchronization when protecting communication paths; and (2) contention occurs, and thus serialization is unavoidable. Our work distinguishes between lock acquisitions with respect to work being performed inside a critical section; productive vs. unproductive . Waiting for message reception without doing anything else inside a critical section is an example of unproductive lock acquisition. We show that the high-throughput nature of modern scalable locking protocols translates into better communication progress for throughput-intensive MPI communication but negatively impacts latency-sensitive communication because of overzealous unproductive lock acquisition. To reduce unproductive lock acquisitions, we devised a method that promotes threads with productive work using a generic two-level priority locking protocol. Our results show that using a high-throughput protocol for productive work and a fair protocol for less productive code paths ensures the best tradeoff for fine-grained communication, whereas a fair protocol is sufficient for more coarse-grained communication. Although these efforts have been rewarding, scalability degradation remains significant. We discuss techniques that diverge from the pure locking model and offer the potential to further improve scalability.

Funder

Exascale Computing Project

Science Technology and Innovation Committee of Shenzhen Municipality

JSPS KAKENHI

U.S. Department of Energy Office of Science

National Nuclear Security Administration

Publisher

Association for Computing Machinery (ACM)

Subject

Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modeling and Simulation,Software

Reference26 articles.

1. Lock Cohorting

2. Abdelhalim Amer Pavan Balaji Wesley Bland William Gropp Rob Latham Huiwei Lu Lena Oden Antonio Pena Ken Raffenetti Sangmin Seo etal 2015. MPICH User’s Guide. Abdelhalim Amer Pavan Balaji Wesley Bland William Gropp Rob Latham Huiwei Lu Lena Oden Antonio Pena Ken Raffenetti Sangmin Seo et al. 2015. MPICH User’s Guide.

3. Characterizing MPI and Hybrid MPI+Threads Applications at Scale: Case Study with BFS

4. MPI+Threads: runtime contention and remedies

5. AnSnAlgorithm for the Massively Parallel CM-200 Computer

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. X-OpenMP — eXtreme fine-grained tasking using lock-less work stealing;Future Generation Computer Systems;2024-10

2. X-Openmp – Extreme Fine-Grained Tasking Using Lock-Less Work Stealing;2023

3. A Survey on Minimizing Lock Contention in Shared Resources in Linux Kernel;2022 13th International Conference on Information and Communication Technology Convergence (ICTC);2022-10-19

4. A Fine-Grained Page Management Scheme For Hpc Manycore I/O Systems;SSRN Electronic Journal;2022

5. Finer-LRU: A Scalable Page Management Scheme for HPC Manycore Architectures;2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2021-05

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3