Affiliation:
1. University of Illinois at Urbana-Champaign, Urbana, IL
Abstract
This article presents a tool for uncovering bugs due to interactive complexity in networked sensing applications. Such bugs are not localized to one component that is faulty, but rather result from complex and unexpected interactions between multiple often individually nonfaulty components. Moreover, the manifestations of these bugs are often not repeatable, making them particularly hard to find, as the particular sequence of events that invokes the bug may not be easy to reconstruct. Because of the distributed nature of failure scenarios, our tool looks for
sequences
of events that may be responsible for faulty behavior, as opposed to localized bugs such as a bad pointer in a module. We identified several challenges in applying discriminative sequence mining for root cause analysis when the system fails to perform as expected and presented our solutions to those challenges. We also present two alternative schemes, namely, two-stage mining and the progressive discriminative sequence mining to address the scalability challenge. An extensible framework is developed where a front-end collects runtime data logs of the system being debugged and an offline back-end uses frequent discriminative pattern mining to uncover likely causes of failure. We provided several case studies where we applied our tool successfully to troubleshoot the cause of the problem. We uncovered a kernel-level race condition bug in the LiteOS operating system and a protocol design bug in the directed diffusion protocol. We also presented a case study of debugging a multichannel MAC protocol that was found to exhibit corner cases of poor performance (worse than single-channel MAC). The tool helped to uncover event sequences that lead to a highly degraded mode of operation. Fixing the problem significantly improved the performance of the protocol. We also evaluated the extensions presented in this article. Finally, we provided a detailed analysis of tool overhead in terms of memory requirements and impact on the running application.
Funder
Division of Computer and Network Systems
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献