A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation-Reference-Cited by-同舟云学术

A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation

Published:2020-03-11 Issue: Volume:9 Page:
ISSN:2050-084X
Container-title:eLife
language:en
Short-container-title:

Author:

Quintana Daniel S¹^ORCID

Affiliation:

1. Norwegian Centre for Mental Disorders Research (NORMENT), Division of Mental Health and Addiction, University of Oslo, and Oslo University Hospital, Oslo, Norway

Abstract

Open research data provide considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept of synthetic datasets, which is an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This practical guide with accompanying R script enables biobehavioural researchers to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy.

Funder

Novo Nordisk Foundation

Publisher

eLife Sciences Publications, Ltd

Subject

General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience

Link

https://cdn.elifesciences.org/articles/53275/elife-53275-v2.pdf

Reference49 articles.

1. Power spectrum analysis of heart rate fluctuation: a quantitative probe of beat-to-beat cardiovascular control;Akselrod;Science,1981

2. Graphs in statistical analysis;Anscombe;The American Statistician,1973

3. Using 26,000 diary entries to show ovulatory changes in sexual desire and behavior;Arslan;Journal of Personality and Social Psychology,2018

4. Promoting access to public research data for scientific, economic, and social development;Arzberger;Data Science Journal,2004

5. Recommendations for increasing replicability in psychology;Asendorpf;European Journal of Personality,2013

Cited by 82 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Open science needs a standardized data format: Suggestions for the field of psychoneuroendocrinology;Psychoneuroendocrinology;2024-11

2. Synthetic data for reef modelling;Ecological Informatics;2024-09

3. The relationship between gambling behaviour and gambling‐related harm: A data fusion approach using open banking data;Addiction;2024-05-23

4. A Survey on the Use of Synthetic Data for Enhancing Key Aspects of Trustworthy AI in the Energy Domain: Challenges and Opportunities;Energies;2024-04-23

5. Three Persistent Myths about Open Science;Journal of Trial and Error;2024-04-08