A schema for digitized surface swab site metadata in open-source DNA sequence databases

Author:

Feng Barry,Daeschel Devin,Dooley Damion,Griffiths Emma,Allard Marc,Timme Ruth,Chen Yi,Snyder Abigail B.

Abstract

ABSTRACTLarge, open-source DNA sequence databases have been generated, in part, through the collection of microbial pathogens from swabbing surfaces in built environments. Analyzing these data in aggregate through public health surveillance requires digitization of the complex, domain-specific metadata associated with swab site locations. However, the swab site location information is currently collected in a single, free-text “isolation source” field promoting generation of poorly detailed descriptions with varying word order, granularity, and linguistic errors, making automation difficult and reducing machine-actionability. We assessed 1,498 free-text swab site descriptions generated during routine foodborne pathogen surveillance. The lexicon of free-text metadata was evaluated to determine the informational facets and quantity of unique terms used by data collectors. Open Biological Ontologies (OBO) foundry libraries were used to develop hierarchical vocabularies connected with logical relationships to describe swab site locations. Five informational facets described by 338 unique terms were identified via content analysis. Term hierarchy facets were developed as were statements (called axioms) about how entities within these five domains were related. The schema developed through this study has been integrated into a publicly available pathogen metadata standard, facilitating ongoing surveillance and investigations. The One Health Enteric Package is available at NCBI BioSample beginning in 2022. Collective use of metadata standards increases the interoperability of DNA sequence databases, enabling large-scale approaches to data sharing, artificial intelligence, and big-data solutions to food safety.IMPORTANCERegular analysis of whole genome sequence data in collections such as NCBI’s Pathogen Detection Database is used by many public health organizations to detect outbreaks of infectious disease. However, isolate metadata in these databases are often incomplete and poor quality. These complex raw metadata must often be re-organized and manually formatted for use in aggregate analysis. These processes are inefficient and time-consuming, increasing the interpretative labor needed by public health groups to extract actionable information. Future use of open genomic epidemiology networks will be supported through the development of an internationally applicable vocabulary system to describe swab site locations.

Publisher

Cold Spring Harbor Laboratory

Reference41 articles.

1. Amezquita A , Barretto C , Winkler A , et al. The Benefits and Barriers of Whole-Genome Sequencing for Pathogen Source Tracking: A Food Industry Perspective. Food Saf Mag 2020; Available at: https://www.food-safety.com/articles/6696-the-benefits-and-barriers-of-whole-genome-sequencing-for-pathogen-source-tracking-a-food-industry-perspective.

2. Environmental microbiome mapping as a strategy to improve quality and safety in the food industry;Curr Opin Food Sci,2021

3. Swabbing the surface: critical factors in environmental monitoring and a path towards standardization and improvement;Crit Rev Food Sci Nutr,2020

4. Interpretative Labor and the Bane of Nonstandardized Metadata in Public Health Surveillance and Food Safety;Clin Infect Dis,2021

5. Metadata matters: access to image data in the real world

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3