Affiliation:
1. Natural Diversity Discovery Project
2. UC Davis Genome Center
Abstract
We describe an approach for pipelining nested data collections in scientific workflows. Our approach logically delimits arbitrarily nested collections of data tokens using special, paired control tokens inserted into token streams, and provides workflow components with high-level operations for managing these collections. Our framework provides new capabilities for: (1) concurrent operation on collections; (2) on-the-fly customization of workflow component behavior; (3) improved handling of exceptions and faults; and (4) transparent passing of provenance and metadata within token streams. We demonstrate our approach using a workflow for inferring phylogenetic trees. We also describe future extensions to support richer typing mechanisms for facilitating sharing and reuse of workflow components between disciplines. This work represents a step towards our larger goal of exploiting collection-oriented dataflow programming as a new paradigm for scientific workflow systems, an approach we believe will significantly reduce the complexity of creating and reusing workflows and workflow components.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems,Software
Cited by
19 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. MCEM;Proceedings of the 9th International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '19;2019
2. Scientific Workflows;ACM Computing Surveys;2017-12-31
3. Provenance-Driven Data Curation Workflow Analysis;Proceedings of the 2015 ACM SIGMOD on PhD Symposium;2015-05-31
4. A Workflow for the Prediction of the Effects of Residue Substitution on Protein Stability;Pattern Recognition in Bioinformatics;2013
5. Enforcing QoS in scientific workflow systems enacted over Cloud infrastructures;Journal of Computer and System Sciences;2012-09