Abstract
AbstractStaphylococcus aureuscauses both hospital and community acquired infections in humans worldwide. Due to the high incidence of infectionS. aureusis also one of the most sampled and sequenced pathogens today, providing an outstanding resource to understand variation at the bacterial subspecies level. We processed and downsampled 83,383 publicS. aureusIllumina whole genome shotgun sequences and 1,263 complete genomes to produce 7,954 representative substrains. Pairwise comparison of core gene Average Nucleotide Identity (ANI) revealed a natural boundary of 99.5% that could be used to define 145 distinct strains within the species. We found that intermediate frequency genes in the pangenome (present in 10-95% of genomes) could be divided into those closely linked to strain background (“strain-concentrated”) and those highly variable within strains (“strain-diffuse”). Non-core genes had different patterns of chromosome location; notably, strain-diffuse associated with prophages, strain-concentrated with the vSaβ genome island and rare genes (<10% frequency) concentrated near the origin of replication. Antibiotic genes were enriched in the strain-diffuse class, while virulence genes were distributed between strain-diffuse, strain-concentrated, core and rare classes. This study shows how different patterns of gene movement help create strains as distinct subspecies entities and provide insight into the diverse histories of importantS. aureusfunctions.ImportanceWe analyzed the genomic diversity ofStaphylococcus aureus, a globally prevalent bacterial species that causes serious infections in humans. Our goal was to build a genetic picture of the different strains ofS. aureusand which genes may be associated with them. We used a large public dataset (>84,000 genomes) that was re-processed and subsampled to remove redundancy. We found that individual genomes could be grouped into strains by sharing > 99.5% identical nucleotide sequence of the core part of their genome. We also showed that a portion of genes that are present in intermediate frequency in the species are strongly associated with some strains but completely absent from others, suggesting a role in strain-specificity. This work lays the foundation for understanding individual gene histories of theS. aureusspecies and also outlines strategies for processing large bacterial genomic datasets.
Publisher
Cold Spring Harbor Laboratory