Accessible data curation and analytics for international-scale citizen science datasets-Reference-Cited by-同舟云学术

Accessible data curation and analytics for international-scale citizen science datasets

Published:2021-11-22 Issue:1 Volume:8 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

Murray Benjamin^ORCID,Kerfoot Eric,Chen Liyuan,Deng Jie,Graham Mark S.^ORCID,Sudre Carole H.,Molteni Erika^ORCID,Canas Liane S.^ORCID,Antonelli Michela,Klaser Kerstin,Visconti Alessia^ORCID,Hammers Alexander^ORCID,Chan Andrew T.^ORCID,Franks Paul W.,Davies Richard^ORCID,Wolf Jonathan^ORCID,Spector Tim D.^ORCID,Steves Claire J.,Modat Marc,Ourselin Sebastien

Abstract

AbstractThe Covid Symptom Study, a smartphone-based surveillance study on COVID-19 symptoms in the population, is an exemplar of big data citizen science. As of May 23rd, 2021, over 5 million participants have collectively logged over 360 million self-assessment reports since its introduction in March 2020. The success of the Covid Symptom Study creates significant technical challenges around effective data curation. The primary issue is scale. The size of the dataset means that it can no longer be readily processed using standard Python-based data analytics software such as Pandas on commodity hardware. Alternative technologies exist but carry a higher technical complexity and are less accessible to many researchers. We present ExeTera, a Python-based open source software package designed to provide Pandas-like data analytics on datasets that approach terabyte scales. We present its design and capabilities, and show how it is a critical component of a data curation pipeline that enables reproducible research across an international research group for the Covid Symptom Study.

Funder

RCUK | Engineering and Physical Sciences Research Council

Wellcome Trust

Chronic Disease Research Foundation

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability

Link

https://www.nature.com/articles/s41597-021-01071-x.pdf

Reference33 articles.

1. Silvertown, J. A new dawn for citizen science. Trends in ecology & evolution 24, 467–471 (2009).

2. Newman, G. et al. The future of citizen science: emerging technologies and shifting paradigms. Frontiers in Ecology and the Environment 10, 298–304 (2012).

3. Follett, R. & Strezov, V. An analysis of citizen science based research: usage and publication patterns. PloS one 10, e0143687 (2015).

4. Heigl, F., Kieslinger, B., Paul, K. T., Uhlik, J. & Dörler, D. Opinion: Toward an international definition of citizen science. Proceedings of the National Academy of Sciences 116, 8089–8092 (2019).

5. Drew, D. A. et al. Rapid implementation of mobile technology for real-time epidemiology of covid-19. Science 368, 1362–1367 (2020).

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Participatory to what end? Mapping motivations for participatory approaches in data-driven projects;Proceedings of the 2024 International Conference on Information Technology for Social Good;2024-09-04

2. Citizen science: Technology tools to empower consumers for more resilient and sustainable healthcare systems during COVID-19 and beyond;Resilient Health;2024

3. SARS-CoV-2 infection following booster vaccination: Illness and symptom profile in a prospective, observational community-based case-control study;Journal of Infection;2023-12

4. Machine Learning-Based Approach to Developing Potent EGFR Inhibitors for Breast Cancer─Design, Synthesis, and In Vitro Evaluation;ACS Omega;2023-08-23

5. The effects of COVID-19 on cognitive performance in a community-based cohort: a COVID symptom study biobank prospective cohort study;eClinicalMedicine;2023-08