Performance Bug Analysis and Detection for Distributed Storage and Computing Systems

Author:

Li Jiaxin1ORCID,Zhang Yiming2ORCID,Lu Shan3ORCID,Gunawi Haryadi S.3ORCID,Gu Xiaohui4ORCID,Huang Feng1ORCID,Li Dongsheng1ORCID

Affiliation:

1. National University of Defense Technology, China

2. National University of Defense Technology and Xiamen University, China

3. University of Chicago, USA

4. North Carolina State University, USA

Abstract

This article systematically studies 99 distributed performance bugs from five widely deployed distributed storage and computing systems (Cassandra, HBase, HDFS, Hadoop MapReduce and ZooKeeper). We present the TaxPerf database, which collectively organizes the analysis results as over 400 classification labels and over 2,500 lines of bug re-description. TaxPerf is classified into six bug categories (and 18 bug subcategories) by their root causes; resource, blocking, synchronization, optimization, configuration, and logic. TaxPerf can be used as a benchmark for performance bug studies and debug tool designs. Although it is impractical to automatically detect all categories of performance bugs in TaxPerf, we find that an important category of blocking bugs can be effectively solved by analysis tools. We analyze the cascading nature of blocking bugs and design an automatic detection tool called PCatch , which (i) performs program analysis to identify code regions whose execution time can potentially increase dramatically with the workload size; (ii) adapts the traditional happens-before model to reason about software resource contention and performance dependency relationship; and (iii) uses dynamic tracking to identify whether the slowdown propagation is contained in one job. Evaluation shows that PCatch can accurately detect blocking bugs of representative distributed storage and computing systems by observing system executions under small-scale workloads.

Funder

National Key Research and Development Program of China

Scientific Research Program of National University of Defense Technology

National Natural Science Foundation of China

Natural Science Foundation of Hunan Province of China

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture

Reference64 articles.

1. Apache HBase Project. (n. d.). Retrieved January 29 2023 from http://hbase.apache.org.

2. Apache ZooKeeper Project. (n. d.). Retrieved January 29 2023 from http://zookeeper.apache.org.

3. HDFS Architecture. (n. d.). Retrieved January 29 2023 from http://hadoop.apache.org/common/docs/current/hdfs_design.html.

4. Dynamic program slicing

5. Performance debugging for distributed systems of black boxes

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3