Abstract
AbstractBackgroundPneumococcal genomes are highly dynamic with varying core genome sizes. The genotype classification system, Global Pneumococcal Sequence Clusters, identified patterns within genotype and antibiotic resistance. Few genotypes like GPSC10 are frequently associated with antimicrobial resistance and high rates of non-vaccine serotypes.ObjectiveTo identify and annotate the differences in the core genomes of major GPSC in India, and construct and analyse the Indian Pneumococcal Pangenome (IPPG).MethodsUsing existing dataset from the Global Pneumococcal Sequencing Project, 618 strains were included. The most frequent GPSCs: GPSC1, GOSC2, GPSC8, GPSC9 and GPSC10 were analyzed separately. Pangenomes were constructed using Panaroo with tuning the family threshold parameter. Differences in protein clusters were identified using Orthovenn3 webserver. Functional annotations were performed by eggNOG, Uniprot and STRING database searches.ResultsThe IPPG core genome size (1615 genes) was similar to those reported previously, with similar distribution of metabolic categories across the five GPSC types. The GPSC10 (1619 genes) and GPSC1 (1909 genes) had the lowest and highest core genome sizes respectively, and these core genomes possessed genes encoding for macrolide and tetracycline resistance. Virulence genes ply, psaA, pce (cbpE), pavA, nanB, lytA, and hysA are detected among all the core genomes.ConclusionsThere is a genotype specific variation within the core genomes of major GPSCs in India. The presence of antibiotic resistance genes among GPSC1 and GPSC10 core genomes explain widespread drug resistance due to these genotypes. The core virulence genes identified among all the genotypes indicate conserved pathogenesis mechanisms, and can be targets for vaccine development or therapy.
Publisher
Cold Spring Harbor Laboratory