Abstract
AbstractBacterial phylogenetic analyses are commonly performed to explore relationships among various bacterial species and genera based on their 16S rRNA gene sequences; however, the results are limited by mosaicism, intragenomic heterogeneity, and difficulty in distinguishing between related species. In this study, we aimed to perform genome-wide comparisons of different bacterial species, namely Escherichia coli and Shigella and Yersinia, Klebsiella, and Neisseria spp. or serotypes of Listeria monocytogenes, based on their K-mer profiles to construct phylogenetic trees. Pentanucleotide frequency analysis (512 patterns of 5 nucleotides each) was performed to distinguish highly similar species from each other, such as Yersinia species, along with Shigella spp. and E. coli. Moreover, Escherichia albertii strains were clearly distinguished from E. coli and Shigella, despite being closely related to enterohemorrhagic E. coli in the phylogenetic tree. In addition, the phylogenetic tree of Ipomoea species based on pentamer frequency in chloroplast genomes correlated with previously reported morphological similarities. Furthermore, a support vector machine clearly classified E. coli and Shigella genomes based on pentanucleotide profiles. These results suggest that phylogenetic analysis based on penta- or hexamer profiles is a useful alternative for microbial phylogenetic studies. In addition, we introduced an R application, Phy5, which generates a phylogenetic tree based on genome-wide comparisons of pentamer profiles. The online version of Phy5 can be accessed at https://phy5.shinyapps.io/Phy5R/.
Publisher
Cold Spring Harbor Laboratory