Affiliation:
1. Universidade Federal Fluminense, Brazil
2. New York University, Brooklyn, New York, NY, United States of America
3. Universidade Federal Fluminense
Abstract
Scripts are widely used to design and run scientific experiments. Scripting languages are easy to learn and use, and they allow complex tasks to be specified and executed in fewer steps than with traditional programming languages. However, they also have important limitations for reproducibility and data management. As experiments are iteratively refined, it is challenging to reason about each experiment run (or trial), to keep track of the association between trials and experiment instances as well as the differences across trials, and to connect results to specific input data and parameters. Approaches have been proposed that address these limitations by collecting, managing, and analyzing the provenance of scripts. In this article, we survey the state of the art in provenance for scripts. We have identified the approaches by following an exhaustive protocol of forward and backward literature snowballing. Based on a detailed study, we propose a taxonomy and classify the approaches using this taxonomy.
Funder
AT&T
DARPA
Moore-Sloan Data Science Environment at NYU
Conselho Nacional de Desenvolvimento Científico e Tecnológico
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
National Science Foundation
Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro
Publisher
Association for Computing Machinery (ACM)
Subject
General Computer Science,Theoretical Computer Science
Cited by
34 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献