Randomized testing of distributed systems with probabilistic guarantees

Author:

Ozkan Burcu Kulahcioglu1,Majumdar Rupak1,Niksic Filip1,Befrouei Mitra Tabaei2,Weissenbacher Georg2

Affiliation:

1. MPI-SWS, Germany

2. Vienna University of Technology, Austria

Abstract

Several recently proposed randomized testing tools for concurrent and distributed systems come with theoretical guarantees on their success. The key to these guarantees is a notion of bug depth—the minimum length of a sequence of events sufficient to expose the bug—and a characterization of d -hitting families of schedules—a set of schedules guaranteed to cover every bug of given depth d . Previous results show that in certain cases the size of a d -hitting family can be significantly smaller than the total number of possible schedules. However, these results either assume shared-memory multithreading, or that the underlying partial ordering of events is known statically and has special structure. These assumptions are not met by distributed message-passing applications. In this paper, we present a randomized scheduling algorithm for testing distributed systems. In contrast to previous approaches, our algorithm works for arbitrary partially ordered sets of events revealed online as the program is being executed. We show that for partial orders of width at most w and size at most n (both statically unknown), our algorithm is guaranteed to sample from at most w 2 n d −1 schedules, for every fixed bug depth d . Thus, our algorithm discovers a bug of depth d with probability at least 1 / ( w 2 n d −1 ). As a special case, our algorithm recovers a previous randomized testing algorithm for multithreaded programs. Our algorithm is simple to implement, but the correctness arguments depend on difficult combinatorial results about online dimension and online chain partitioning of partially ordered sets. We have implemented our algorithm in a randomized testing tool for distributed message-passing programs. We show that our algorithm can find bugs in distributed systems such as Zookeeper and Cassandra, and empirically outperforms naive random exploration while providing theoretical guarantees.

Funder

European Research Council

Vienna Science and Technology Fund

Austrian Science Fund

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,Software

Reference34 articles.

1. Optimal dynamic partial order reduction

2. Efficient dependency tracking for relevant events in concurrent systems

3. On-Line Chain Partitions of Orders: A Survey

4. An Easy Subexponential Bound for Online Chain Partitioning;Bosek Bartłomiej;Electr. J. Comb.,2018

Cited by 22 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Generalized Concurrency Testing Tool for Distributed Systems;Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis;2024-09-11

2. Greybox Fuzzing for Concurrency Testing;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2;2024-04-27

3. A Domain Specific Language for Testing Distributed Protocol Implementations;Lecture Notes in Computer Science;2024

4. Leveraging TLA$$^+$$ Specifications to Improve the Reliability of the ZooKeeperCoordination Service;Dependable Software Engineering. Theories, Tools, and Applications;2023-12-15

5. Greybox Fuzzing of Distributed Systems;Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security;2023-11-15

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3