Affiliation:
1. Texas Advanced Computing Center, The University of Texas at Austin , Austin , TX, United States
2. CyVerse, University of Arizona , Arizona , United States
Abstract
Abstract
The Identifier Services (IDS) project conducted research into and built a prototype to manage distributed genomics datasets remotely and over time. Inspired by archival concepts, IDS allows researchers to track dataset evolution through multiple copies, modifications, and derivatives, independent of where data are located – both symbolically, in the research lifecycle, and physically, in a repository or storage facility. The prototype implementation is based on a three-step data modeling process involving: a) understanding and recording of different researcher workflows, b) mapping the workflows and data to a generic data model and identifying functions, and c) integrating the data model as architecture and interactive functions into cyberinfrastructure (CI). Identity functions are operationalized as continuous tracking of authenticity attributes including data location, differences between seemingly identical datasets, metadata, data integrity, and the roles of different types of local and global identifiers used during the research lifecycle. CI resources were used to conduct identity functions at scale, including scheduling content comparison tasks on high-performance computing resources. The prototype was developed and evaluated considering six data test cases, and feedback was received through a focus-group activity. While there are some technical roadblocks to overcome, our project demonstrates that identity functions are innovative solutions to manage large distributed genomic datasets.
Subject
Geology,Ocean Engineering,Water Science and Technology
Reference55 articles.
1. Afgan, E., Baker, D., van den Beek, M., Blankenberg, D., Bouvier, D., Čech, M., Goecks, J. (2016). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Research, 44(W1), W3–W10.
2. Ardini-Poleske, Maryanne E., et al. (2017, August). LungMAP: The Molecular Atlas of Lung Development Program. American Journal of Physiology-Lung Cellular and Molecular Physiology, 313(5), pp. L733–40. physiology.org (Atypon),
3. Bionetworks. (n, d). Website. Retrieved from https://www.synapse.org/
4. Corral - Texas Advanced Computing Center. (n.d.). Retrieved from https://www.tacc.utexas.edu/systems/corral
5. COPO - Earlham Institute Documentation. (2015). Retrieved from https://documentation.tgac.ac.uk/display/COPO/Overview
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献