Abstract
AbstractShigella and enteroinvasive Escherichia coli (EIEC) cause human bacillary dysentery with similar invasion mechanisms and share similar physiological, biochemical and genetic characteristics. The ability to differentiate Shigella and EIEC from each other is important for clinical diagnostic and epidemiologic investigations. The existing genetic signatures may not discriminate between Shigella and EIEC. However, phylogenetically, Shigella and EIEC strains are composed of multiple clusters and are different forms of E. coli. In this study, we identified 10 Shigella clusters, 7 EIEC clusters and 53 sporadic types of EIEC by examining over 17,000 publicly available Shigella/EIEC genomes. We compared Shigella and EIEC accessory genomes to identify the cluster-specific gene markers or marker sets for the 17 clusters and 53 sporadic types. The gene markers showed 99.63% accuracy and more than 97.02% specificity. In addition, we developed a freely available in silico serotyping pipeline named Shigella EIEC Cluster Enhanced Serotype Finder (ShigEiFinder) by incorporating the cluster-specific gene markers and established Shigella/EIEC serotype specific O antigen genes and modification genes into typing. ShigEiFinder can process either paired end Illumina sequencing reads or assembled genomes and almost perfectly differentiated Shigella from EIEC with 99.70% and 99.81% cluster assignment accuracy for the assembled genomes and mapped reads respectively. ShigEiFinder was able to serotype over 59 Shigella serotypes and 22 EIEC serotypes and provided a high specificity with 99.40% for assembled genomes and 99.38% for mapped reads for serotyping. The cluster markers and our new serotyping tool, ShigEiFinder (https://github.com/LanLab/ShigEiFinder), will be useful for epidemiologic and diagnostic investigations.Impact statementThe differentiation of Shigella strains from enteroinvasive E. coli (EIEC) is important for clinical diagnosis and public health epidemiologic investigations. The similarities between Shigella and EIEC strains make this differentiation very difficult as both share common ancestries within E. coli. However, Shigella and EIEC are phylogenetically separated into multiple clusters, making high resolution separation using cluster specific genomic markers possible. In this study, we identified 17 Shigella or EIEC clusters including five that were newly identified through examination of over 17,000 publicly available Shigella and EIEC genomes. We further identified an individual or a set of cluster-specific gene markers for each cluster using comparative genomic analysis. These markers can then be used to classify isolates into clusters and were used to develop an in silico pipeline, ShigEiFinder (https://github.com/LanLab/ShigEiFinder) for accurate differentiation, cluster typing and serotyping of Shigella and EIEC from Illumina sequencing reads or assembled genomes. This study will have broad application from understanding the evolution of Shigella/EIEC to diagnosis and epidemiology.Data summarySequencing data have been deposited at the National Center for Biotechnology Information under BioProject number PRJNA692536.RepositoriesRaw sequence data are available from NCBI under the BioProject number PRJNA692536.
Publisher
Cold Spring Harbor Laboratory
Reference85 articles.
1. Inoculum Size in Shigellosis and Implications for Expected Mode of Transmission
2. Estimates of global, regional, and national morbidity, mortality, and aetiologies of diarrhoeal diseases: a systematic analysis for the Global Burden of Disease Study 2015
3. World Health Organization. Guidelines for the control of shigellosis, including epidemics due to Shigella dysenteriae type 1. 2005.
4. World Health Organization estimates of the global and regional disease burden of 22 foodborne bacterial, protozoal, and viral diseases, 2010: a data synthesis;PLoS medicine,2015
5. PCR-based method for Shigella flexneri serotyping: international multicenter validation;Journal of clinical microbiology,2019