Affiliation:
1. University of Massachusetts Amherst, Amherst, MA, USA
Abstract
Testing and static analysis can help root out bugs in programs, but not in data. This paper introduces
data debugging
, an approach that combines program analysis and statistical analysis to
automatically
find potential data errors. Since it is impossible to know a priori whether data are erroneous, data debugging instead locates data that has a disproportionate impact on the computation. Such data is either very important, or wrong. Data debugging is especially useful in the context of data-intensive programming environments that intertwine data with programs in the form of queries or formulas.
We present the first data debugging tool, CheckCell, an add-in for Microsoft Excel. CheckCell identifies cells that have an unusually high impact on the spreadsheet's computations. We show that CheckCell is both analytically and empirically fast and effective. We show that it successfully finds injected typographical errors produced by a generative model trained with data entry from 169,112 Mechanical Turk tasks. CheckCell is more precise and efficient than standard outlier detection techniques. CheckCell also automatically identifies a key flaw in the infamous Reinhart and Rogoff spreadsheet.
Funder
Microsoft
Division of Computing and Communication Foundations
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference43 articles.
1. A type system for statically detecting spreadsheet errors
2. Robustness analysis of avionics embedded systems
3. Apache Foundation. Welcome to Apache Hadoop. http://hadoop.apache.org/ Nov. 2012. Apache Foundation. Welcome to Apache Hadoop. http://hadoop.apache.org/ Nov. 2012.
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献