FAIR data pipeline: provenance-driven data management for traceable scientific workflows

Author:

Mitchell Sonia Natalie12ORCID,Lahiff Andrew3ORCID,Cummings Nathan3ORCID,Hollocombe Jonathan3ORCID,Boskamp Bram4ORCID,Field Ryan5ORCID,Reddyhoff Dennis6ORCID,Zarebski Kristian3ORCID,Wilson Antony7ORCID,Viola Bruno3ORCID,Burke Martin4ORCID,Archibald Blair8ORCID,Bessell Paul9ORCID,Blackwell Richard10,Boden Lisa A.9ORCID,Brett Alys3ORCID,Brett Sam,Dundas Ruth5ORCID,Enright Jessica28ORCID,Gonzalez-Beltran Alejandra N.7ORCID,Harris Claire24ORCID,Hinder Ian11ORCID,David Hughes Christopher10,Knight Martin4ORCID,Mano Vino10,McMonagle Ciaran25ORCID,Mellor Dominic212ORCID,Mohr Sibylle12ORCID,Marion Glenn24ORCID,Matthews Louise12ORCID,McKendrick Iain J.24ORCID,Mark Pooley Christopher4ORCID,Porphyre Thibaud13ORCID,Reeves Aaron14ORCID,Townsend Edward,Turner Robert6ORCID,Walton Jeremy15ORCID,Reeve Richard12ORCID

Affiliation:

1. Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK

2. Boyd Orr Centre for Population and Ecosystem Health, University of Glasgow, Glasgow, G12 8QQ, UK

3. United Kingdom Atomic Energy Authority, Didcot OX14 3DB, UK

4. Biomathematics and Statistics Scotland (BioSS), James Clerk Maxwell Building, Peter Guthrie Tait Road, The King’s Buildings, Edinburgh EH9 3FD, UK

5. MRC/CSO Social and Public Health Sciences Unit, Institute of Health and Wellbeing, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK

6. Department of Computer Science, University of Sheffield, Regent Court, Sheffield S1 4DP, UK

7. Science and Technology Facilities Council, Harwell Campus, Harwell OX11, UK

8. School of Computing Science, College of Science and Engineering, University of Glasgow, Glasgow, G12 8QQ, UK

9. Roslin Institute, University of Edinburgh, Edinburgh EH8 9YL, UK

10. Man Group plc, Riverbank House, 2 Swan Lane, London EC4R 3AD, UK

11. The University of Manchester, Research IT, Manchester M1 3BU, UK

12. School of Veterinary Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G61 1QH, UK

13. VetAgro Sup, UMR5558 Laboratoire de Biométrie et Biologie Évolutive, Campus vétérinaire de Lyon, Marcy-l’Etoile 69280, France

14. Scotland’s Rural College (SRUC), Peter Wilson Building, The King’s Buildings, West Mains Road, Edinburgh EH9 3JG, UK

15. UK Earth System Model Core Group, Met Office, Exeter EX1 3PB, UK

Abstract

Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of ‘following the science’ are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’.

Funder

Engineering and Physical Sciences Research Council

Biotechnology and Biological Sciences Research Council

Scottish Government Chief Scientist Office

Natural Environment Research Council

Agence Nationale de la Recherche

Science and Technology Facilities Council

UK Atomic Energy Authority

Medical Research Council

Boehringer Ingelheim Animal Health France

Rural and Environment Science and Analytical Services Division

Publisher

The Royal Society

Subject

General Physics and Astronomy,General Engineering,General Mathematics

Reference82 articles.

1. Centre for Mathematical Modelling of Infectious Diseases. 2021 COVID-UK. London School of Hygiene & Tropical Medicine. Original date: 2020-05-04T16:42:32Z. See https://github.com/cmmid/covid-uk.

2. MRC Centre for Global Infectious Disease Analysis. CovidSim. MRC Centre for Global Infectious Disease Analysis; 2021. Original date: 2020-05-04T16:42:32Z. See https://github.com/mrc-ide/covid-sim.

3. Adam Kucharski. 2020-cov-tracing. London School of Hygiene & Tropical Medicine; 2021. Original date: 2020-05-04T16:42:32Z. See https://github.com/adamkucharski/2020-ncov.

4. Adam Kucharski. 2020-ncov. London School of Hygiene & Tropical Medicine; 2021. Original date: 2020-05-04T16:42:32Z. See https://github.com/adamkucharski/2020-ncov.

5. Early dynamics of transmission and control of COVID-19: a mathematical modelling study

Cited by 17 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3