Abstract
AbstractBiological science produces “big data” in varied formats, which necessitates using computational tools to process, integrate, and analyse data. Researchers using computational biology tools range from those using computers for communication, to those writing analysis code. We examine differences in how researchers conceptualise the same data, which we call “subjective data models”. We interviewed 22 people with biological experience and varied levels of computational experience, and found that many had fluid subjective data models that changed depending on circumstance. Surprisingly, results did not cluster around participants’ computational experience levels. People did not consistently map entities from abstract data models to the real-world entities in files, and certain data identifier formats were easier to infer meaning from than others. Real-world implications: 1) software engineers should design interfaces for task performance, emulating popular user interfaces, rather than targeting professional backgrounds; 2) when insufficient context is provided, people may guess what data means, whether or not they are correct, emphasising the importance of contextual metadata to remove the need for erroneous guesswork.
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability
Reference30 articles.
1. Park, J. C., Kim, T. & Park, J. Monitoring the evolutionary aspect of the Gene Ontology to enhance predictability and usability. BMC Bioinformatics 9, S7 https://www.covid19dataportal.org/ (2008).
2. Alliance of Genome Resources Consortium. Alliance of Genome Resources Portal: unified model organism research platform. Nucleic Acids Res. 48, D650–D658 (2020).
3. Alliance of Genome Resources Consortium. The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases. Genetics 213, 1189–1196 (2019).
4. Harrison, P. W. et al. The COVID-19 Data Portal: accelerating SARS-CoV-2 and COVID-19 research through rapid open access data sharing. Nucleic Acids Res. 49, W619–W623 (2021).
5. Harrison, P. W., Lopez, R., Rahman, N. & Allen, S. G. COVID-19 Data Portal - accelerating scientific research through data. https://www.covid19dataportal.org/ (2021).