The Pangenome of<i>Escherichia coli</i>-Reference-Cited by-同舟云学术

The Pangenome ofEscherichia coli

Published:2024-06-08 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Chauhan Siddharth M.^ORCID,Ardalani Omid^ORCID,Hyun Jason C.^ORCID,Monk Jonathan M.,Phaneuf Patrick V.^ORCID,Palsson Bernhard O.^ORCID

Abstract

AbstractThousands of complete genome sequences for strains of a species that are now available enable the advancement of pangenome analytics to a new level of sophistication. We collected 2,377 publicly-available complete genomes ofEscherichia colifor detailed pangenome analysis. The core genome and accessory genomes consisted of 2,398 and 5,182 genes, respectively. We developed a machine learning approach to define the accessory genes characterizing the major phylogroups ofE. coliplus Shigella: A, B1, B2, C, D, E, F, G, and Shigella. The analysis resulted in a detailed structure of the genetic basis of the phylogroups’ differential traits. This pangenome structure was largely consistent with a housekeeping-gene-based MLST distribution, sequence-based Mash distance, and the Clermont quadruplex classification. The rare genome consisted of 163,619 genes, about 79% of which represented variations of 315 underlying transposon elements. This analysis generated a mathematical definition of the genetic basis for a species.

Publisher

Cold Spring Harbor Laboratory

Reference52 articles.

1. Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups

2. EnteroBase: hierarchical clustering of 100 000s of bacterial genomes into species/subspecies and populations

3. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database

4. ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping

5. Horizontally acquired papGII-containing pathogenicity islands underlie the emergence of invasive uropathogenic Escherichia coli lineages