Affiliation:
1. University of California
2. Unaffiliated
3. University of Pennsylvania
Abstract
Computational notebooks are commonly used for iterative workflows, such as in exploratory data analysis. This process lends itself to the accumulation of old code and hidden state, making it hard for users to reason about the lineage of, e.g., plots depicting insights or trained machine learning models. One way to reason about code used to generate various notebook data artifacts is to compute aprogram slice, but traditional static approaches to slicing can be both inaccurate (failing to contain relevant code for artifacts) and conservative (containing unnecessary code for an artifacts). We present nbslicer, a dynamic slicer optimized for the notebook setting whose instrumentation for resolving dynamic data dependencies is bothbolt-on(and therefore portable) andswitchable(allowing it to be selectively disabled in order to reduce instrumentation overhead). We demonstrate Nbslicer's ability to construct small and accuratebackward slices(i.e., historical cell dependencies) andforward slices(i.e., cells affected by the "rerun" of an earlier cell), thereby improving reproducibility in notebooks and enabling faster reactive re-execution, respectively. Comparing nbslicer with a static slicer on 374 real notebook sessions, we found that nbslicer filters out far more superfluous program statements while maintaining slice correctness, giving slices that are, on average, 66% and 54% smaller for backward and forward slices, respectively.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Reference64 articles.
1. 2018 (accessed December 1 2020). Datalore. https://datalore.jetbrains.com/. 2018 (accessed December 1 2020). Datalore. https://datalore.jetbrains.com/.
2. 2021. Pyccolo: Declarative Instrumentation for Python. https://github.com/smacke/pyccolo. 2021. Pyccolo: Declarative Instrumentation for Python. https://github.com/smacke/pyccolo.
3. 2022. AST NodeTransformer. https://docs.python.org/3/library/ast.html#ast.NodeTransformer. 2022. AST NodeTransformer. https://docs.python.org/3/library/ast.html#ast.NodeTransformer.
4. 2022. sys: System-specific parameters and functions. https://docs.python.org/3/library/sys.html#sys.settrace. Date accessed: 2022-02-28. 2022. sys: System-specific parameters and functions. https://docs.python.org/3/library/sys.html#sys.settrace. Date accessed: 2022-02-28.
5. Evaluating explicitly context-sensitive program slicing
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Using Run-Time Information to Enhance Static Analysis of Machine Learning Code in Notebooks;Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering;2024-07-10
2. ElasticNotebook: Enabling Live Migration for Computational Notebooks;Proceedings of the VLDB Endowment;2023-10
3. Facilitating Dependency Exploration in Computational Notebooks;Proceedings of the Workshop on Human-In-the-Loop Data Analytics;2023-06-18
4. Data Makes Better Data Scientists;Proceedings of the Workshop on Human-In-the-Loop Data Analytics;2023-06-18
5. Bolt-on, Compact, and Rapid Program Slicing for Notebooks;Proceedings of the VLDB Endowment;2022-09