Affiliation:
1. Tel Aviv University, Israel
2. University of Pennsylvania
3. Ben Gurion University, Israel
Abstract
Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all inputs (
coarse-grained
dependencies). Furthermore, it does not model the
internal state
of a module, which can change between repeated executions. In practice, however, an output may depend on only a small subset of the inputs (
fine-grained
dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of
provenance graph
that models module invocations and yields a compact representation of fine-grained workflow provenance. It also enables a number of novel graph transformation operations, allowing to choose the desired level of granularity in provenance querying (ZoomIn and ZoomOut), and supporting "what-if" workflow analytic queries. We implemented our approach in the
Lipstick
system and developed a benchmark in support of a systematic performance evaluation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
75 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献