Author:
Im Sung B.,Gupta Sonali,Jain Mani,Chande Aroon T.,Carleton Heather A.,Jordan I. King,Rishishwar Lavanya
Abstract
Foodborne pathogens are a major public health burden in the United States, leading to 9.4 million illnesses annually. Since 1996, a national laboratory-based surveillance program, PulseNet, has used molecular subtyping and serotyping methods with the aim to reduce the burden of foodborne illness through early detection of emerging outbreaks. PulseNet affiliated laboratories have used pulsed-field gel electrophoresis (PFGE) and immunoassays to subtype and serotype bacterial isolates. Widespread use of serotyping and PFGE for foodborne illness surveillance over the years has resulted in the accumulation of a wealth of routine surveillance and outbreak epidemiological data. This valuable source of data has been used to understand seasonal frequency, geographic distribution, demographic information, exposure information, disease severity, and source of foodborne isolates. In 2019, PulseNet adopted whole genome sequencing (WGS) at a national scale to replace PFGE with higher-resolution methods such as the core genome multilocus sequence typing. Consequently, PulseNet's recent shift to genome-based subtyping methods has rendered the vast collection of historic surveillance data associated with serogroups and PFGE patterns potentially unusable. The goal of this study was to develop a bioinformatics method to associate the WGS data that are currently used by PulseNet for bacterial pathogen subtyping to previously characterized serogroup and PFGE patterns. Previous efforts to associate WGS to PFGE patterns relied on predicting DNA molecular weight based on restriction site analysis. However, these approaches failed owing to the non-uniform usage of genomic restriction sites by PFGE restriction enzymes. We developed a machine learning approach to classify isolates to their most probable serogroup and PFGE pattern, based on comparisons of genomic k-mer signatures. We applied our WGS classification method to 5,970 Shiga toxin-producing Escherichia coli (STEC) isolates collected as part of PulseNet's routine foodborne surveillance activities between 2003 and 2018. Our machine learning classifier is able to associate STEC WGS to higher-level serogroups with very high accuracy and lower-level PFGE patterns with somewhat lower accuracy. Taken together, these classifications support the ability of public health investigators to associate currently generated WGS data with historical epidemiological knowledge linked to serogroups and PFGE patterns in support of outbreak surveillance for food safety and public health.
Subject
Horticulture,Management, Monitoring, Policy and Law,Agronomy and Crop Science,Ecology,Food Science,Global and Planetary Change
Reference37 articles.
1. Guidance for public health laboratories on the isolation and characterization of Shiga toxin-producing Escherichia coli (STEC) from clinical specimens;Atikson;Guidance for Public Health Laboratories on the Isolation and Characterization of Shigatoxin-Producing Escherichia coli (STEC) from Clinical Specimens,2012
2. Rapid detection of Shiga toxin-producing bacteria in feces by multiplex PCR with molecular beacons on the smart cycler;Belanger;J. Clin. Microbiol.,2002
3. In silico analysis of complete bacterial genomes: PCR, AFLP-PCR and endonuclease restriction;Bikandi;Bioinformatics,2004
4. Random forests;Breiman;Mach. Learn.,2001
5. Non-O157 Shiga toxin-producing Escherichia coli infections in the United States, 1983-2002;Brooks;J. Infect. Dis.,2005
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献