Recording provenance of workflow runs with RO-Crate

Author:

Leo SimoneORCID,Crusoe Michael R.ORCID,Rodríguez-Navas Laura,Sirvent RaülORCID,Kanitz AlexanderORCID,De Geest Paul,Wittner Rudolf,Pireddu Luca,Garijo Daniel,Fernández José M.ORCID,Colonnelli IacopoORCID,Gallo Matej,Ohta TazroORCID,Suetake Hirotaka,Capella-Gutierrez SalvadorORCID,de Wit Renske,Kinoshita Bruno P.,Soiland-Reyes StianORCID

Abstract

Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.

Funder

Regione Autonoma della Sardegna

Spanish government

Generalitat de Catalunya

Spanish Government

European High-Performance Computing Joint Undertaking

European Union

ELIXIR

Research Foundation - Flanders (FWO) for ELIXIR Belgium

Universidad Politécnica de Madrid

Comunidad de Madrid

European Union - NextGenerationEU

National Bioscience Database Center

Horizon 2020 Framework Programme

HORIZON EUROPE Framework Programme

UK Research and Innovation

Publisher

Public Library of Science (PLoS)

Reference121 articles.

1. PROV-DM: The PROV Data Model;L Moreau;W3C Recommendation,2013

2. A survey on provenance: What for? What form? What from?;M Herschel;The VLDB Journal,2017

3. Data-Driven Materials Science: Status, Challenges, and Perspectives;L Himanen;Advanced Science,2019

4. A brief history of bioinformatics;J Gauthier;Briefings in Bioinformatics,2019

5. Machine learning and artificial intelligence to aid climate change research and preparedness;C Huntingford;Environmental Research Letters,2019

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3