16S rRNA phylogeny and clustering is not a reliable proxy for genome-based taxonomy inStreptomyces

Author:

Kiepas Angelika BORCID,Hoskisson Paul AORCID,Pritchard LeightonORCID

Abstract

AbstractAlthoughStreptomycesis one of the most extensively studied genera of bacteria, their taxonomy remains contested and is suspected to contain significant species-level misclassification. Resolving the classification ofStreptomyceswould benefit many areas of study and applied microbiology that rely heavily on having an accurate ground truth classification of similar and dissimilar organisms, including comparative genomics-based searches for novel antimicrobials in the fight against the ongoing antimicrobial resistance (AMR) crisis. To attempt a resolution, we investigate taxonomic conflicts between 16S rRNA and whole genome classifications using all available 48,981 full-length 16S rRNAStreptomycessequences from the combined SILVA, Greengenes, Ribosomal Database Project (RDP) and NCBI (National Center for Biotechnology Information) databases, and 2,276 publicly availableStreptomycesgenome assemblies. We construct a 16S gene tree for 14,239 distinctStreptomyces16S rRNA sequences, identifying three major lineages ofStreptomyces, and find that existing taxonomic classifications are inconsistent with the tree topology. We also use these data to delineate 16S and whole genome landscapes forStreptomyces, finding that 16S and whole-genome classifications ofStreptomycesstrains are frequently in disagreement, and in particular that 16S zero-radius Operational Taxonomic Units (zOTUs) are often inconsistent with Average Nucleotide Identity (ANI)-based taxonomy. Our results strongly imply that 16S rRNA sequence data does not map to taxonomy sufficiently well to delineateStreptomycesspecies reliably, and we propose that alternative markers should instead be adopted by the community for classification and metabarcoding. As much of currentStreptomycestaxonomy has been determined or supported by historical 16S sequence data and may in parts be in error, we also propose that reclassification of the genus by alternative approaches is required.Impact StatementAccurate classification of microbes, usually in the form of taxonomic assignments, provides a fundamental ground truth or reference point for many aspects of applied microbiology including comparative genomics, identification of strains for natural product discovery, and dereplication of strains. Bacteria belonging to the genusStreptomycesare an important source of bioactive metabolites and enzymes in biotechnology, and proper understanding of their phylogeny aids understanding of the evolution of industrially important gene products and metabolites, and prioritization of strains for industrial exploitation. Taxonomic classification in the genusStreptomycesis complex and contested, and there are clear conflicts between taxonomies inferred from 16S rRNA and from whole genome sequences. Despite this, 16S sequence-based classifications are still widely used to infer taxonomic identity, to determine community composition, and to prioritise strains for study. We investigate a diverse and comprehensive set ofStreptomycesgenomes using whole-genome Average Nucleotide Identity (ANI) and 16S sequence analysis to delineate and compare classifications made using these approaches. We outline the genomic and 16S sequence landscape ofStreptomyces, demonstrating that (i) distinct taxonomic species may share identical full-length 16S sequences, and (ii) in some instances, isolates representing the same taxonomic species do not share any common 16S rRNA sequence. Our results strongly imply that 16S rRNA sequence variation does not map to taxonomy sufficiently well to delineateStreptomycesspecies reliably, and that alternative markers should instead be adopted by the community. Much of currentStreptomycestaxonomy has been determined or supported by historical 16S sequence data, and we therefore propose that reclassification within this group by alternative approaches is required.Data summaryAll code, raw and supporting data are publicly available from GitHub (https://github.com/kiepczi/Kiepas_et_al_2023_16S) and Zenodo (https://doi.org/10.5281/zenodo.8223787). The flowchart provided inSupplementary File 28provides an overview of analysis steps and serves as a guide through Supplementary Files generated during reconstruction of the 16S phylogeny. The flowchart inSupplementary File 29outlines the workflow processes and supplementary materials used for analysis of 16S rRNA sequences fromStreptomycesgenomes.Supplementary DataSupplementary File 1: Generate figures using Python and R.ZIP file containing all data, Python and R scripts to generate figures for this manuscript. (ZIP 40.9MB)Supplementary File 2: Raw 16S rRNA public databases.Zip file containing four separate txt files with sequence IDs for public 16S rRNA databases used in this manuscript, and an additional txt file with Greengenes sequence taxonomy information, and a Python script used to map taxonomy information to sequences found in Greengenes v13.5. (ZIP 34.8MB)Supplementary File 3: Filtration of 16S rRNA public databases. Zip file containing Python script used for filtration of the raw databases, and generated outputs. (ZIP 7.2MB)Supplementary File 4: Cleaning of the filtrated 16S rRNA local.Zip file containing all bash and Python scripts used to clean the local full-length 16S rRNA local databases by removing redundant and poor quality 16S rRNA sequences. (ZIP 9MB)Supplementary File 5: Sequence Clustering.Zip file containing a bash script used to cluster the full-length cleaned local 16S rRNA Streptomyceslocal databases at various thresholds, and provides txt files with accessions for representative sequences, and cluster members for each clustering threshold. (ZIP 40.8MB)Supplementary File 6: Analysis of taxonomic composition for each clustering threshold.Zip file containing Python scripts, NCBI taxonomy input and all outputs generated used to determine the taxonomic composition for each clustering threshold. (ZIP 49.6)Supplementary File 7. Cluster sizes.Empirical cumulative frequency plot showing cluster sizes generated for all clustering thresholds. (PDF 44KB)Supplementary File 8. Cluster taxID abundance.Empirical cumulative frequency plot for unique numbers of taxID present at all clustering thresholds. (PDF 9KB)Supplementary File 9. MSA.Zip file containing all Python and bash scripts, and additional data needed to generate and clean MSA for phylogenetic analysis. (ZIP 4.2MB)Supplementary File 10. Phylogenetic reconstruction.ZIP file containing bash scripts used for phylogenetic reconstruction, and all generated outputs and log files. (ZIP 16.8MB).Supplementary File 11. Collapse branches.ZIP file containing jupyter notebook used for collapsing branches with the same species names, and the collapsed tree in newick format. (ZIP 385KB)Supplementary File 12. Phylogenetic tree.PDF file showing collapsed phylogenetic tree with marked branches with transfer bootstrap expectation support of >= 50%. (PDF 224KB)Supplementary File 13. Phylogenetic tree.PDF file showing collapsed phylogenetic tree showing distribution ofStreptomyces albusandStreptomyces griseus.(PDF 229KB)Supplementary File 14. Phylogenetic tree.PDF file showing collapsed phylogenetic tree showing distribution ofStreptomyces albulus, Streptomyces lydicusandStreptomyces venezuelae.(PDF 228KB)Supplementary File 15. Phylogenetic tree.PDF file showing collapsed phylogenetic tree showing distribution ofStreptomyces clavuligerusandStreptomyces coelicolor.(PDF 227KB)Supplementary File 16. Phylogenetic tree.PDF file showing collapsed phylogenetic tree showing distribution ofStreptomyces lavendulae, Streptomyces rimosusandStreptomyces scabiei.(PDF 228KB)Supplementary File 17.Streptomycesgenomes.Zip file containing bash scripts used to downloadStreptomycesgenomes, and Python scripts used to check assembly status. The ZIP file also contains two separate txt files withStreptomycesgenomes used in this manuscript: one file with all initial candidates, and a second file with replaced genomes. (ZIP 2.6MB)Supplementary File 18.Extraction of full-length and ambiguity free 16S rRNA sequences fromStreptomycesgenomes.Zip file containing all Python and bash scripts used to extract full-length sequences from the filteredStreptomycesgenomes. A single FASTA file with all extracted 16S rRNA sequences, and a single FASTA file with filtered sequences. A txt file with accession of genomes retained in the analysis. (ZIP 742KB)Supplementary File 19. ANI analysis amongStreptomycesgenomes with identical 16S rRNA sequences.ZIP file containing all Bash and Python scripts used to determine taxonomic boundaries amongStreptomycesgenomes sharing identical full-length 16S rRNA sequences. All output and pyANI log files. (ZIP 37.1MB)Supplementary File 20. Network analysis of genomes based on shared 16S sequences.ZIP file containing jupyter notebook with NetworkX analysis and all associated output files including. bash script for pyANI analysis runs on each connected component and all associated matrices, heatmaps and log files. (ZIP 29.3MB)Supplementary File 21. Interactive network graph.HTML file containing interactive network graph of genomes sharing common full-length 16S sequences with each node colour corresponding to the number of connections/degrees. (HTML 4.7MB)Supplementary File 22. Interactive network graph.HTML file containing interactive network graph of genomes sharing common full-length 16S sequences showing clique (blue) and non-clique (green) components. (HTML 4.7MB)Supplementary File 23. Interactive network graph.HTML file containing interactive network graph of genomes sharing common full-length 16S sequences showing number of unique genera within each connected component. Each candidate genus is represented as a single node colour within a connected component. (HTML 4.7MB)Supplementary File 24. Interactive network graph.HTML file containing interactive network graph of genomes sharing common full-length 16S rRNA sequences showing number of unique species within each connected component. Each candidate species is represented as a single node colour within a connected component. (HTML 4.7MB)Supplementary File 25 Interactive network graph.HTML file containing interactive network graph of genomes sharing common full-length 16S rRNA sequences showing number of unique NCBI names within each connected component. Each NCBI assigned name is represented as a single node colour within a connected component. Gray nodes represent genomes currently lacking assigned species names. (HTML 4.7MB)Supplementary file 26.Intragenomic 16S rRNA heterogeneity within 1,369 Streptomyces genomes which exclusively contain only full-length and ambiguity symbol-free 16S rRNA sequences. A total of 811 genomes containing single 16S rRNA sequences are not shown. (PDF 8KB)Supplementary File 27. Distribution of 16S copies per genome with a distinction between unique and total copies for genomes at assembly level complete and chromosome.(PDF 7KB)Supplementary File 28. Schematic workflow for construction of the full-length 16S rRNAStreptomycesphylogeny. Each arrow represents a process and is annotated with script used and corresponding supplementary file. Output/data files, and the number of remaining sequences after each step, are indicated by rectangles. The green shading represents a single processing step of collecting and collating 16S database sequences. (PDF 91KB)Supplementary File 29. Schematic representation of the pipeline used to filter publicly availableStreptomycesgenomes.(PDF 59KB)Supplementary File 30. Sankey plot showing counts of taxonomic names in source databases, assigned at ranks from phylum to genus, to sequences identified with a key word ‘Streptomyces’ in the taxonomy field.Note that Actinobacteria and Actinobacteriota are synonyms in LPSN for the correct Phylum name Actinomycetota, but that Actinomycetales and Streptomycetales are not taxonomic synonyms for each other. Streptomycetales is synonymous in LPSN with the correct name Kitasatosporales; Actinomycetales is a distinct taxonomic Order. The parent order of the Family Streptomycetaceae in LPSN is Kitasatosporales. (PDF 64KB)Supplementary File 31. Rectangular phylogram of the comprehensive maximum-likelihood tree of the genus Streptomyces based on the 16S sequence diversity of all 5,064 full-length 16S rRNA sequences with 100 TBE values.(PDF 194KB)Supplementary file 32. Genomes sharing identical 16S rRNA sequences are assigned different names in NCBI. A total of 1,030 singleton clusters are not shown.(PDF 8KB)Supplementary File 33. Phylogenetic tree.PDF file showing collapsed phylogenetic tree showing distribution of members of the novelAcintacidiphilagenus. (PDF 228KB)Supplementary File 34. Phylogenetic tree.PDF file showing collapsed phylogenetic tree showing distribution of members of the novelPhaeacidiphilusgenus. (PDF 228KB)Supplementary File 35. Phylogenetic tree.PDF file showing collapsed phylogenetic tree showing distribution of members of the novelMangrovactinosporagenus. (PDF 228KB)Supplementary File 36. Phylogenetic tree.PDF file showing collapsed phylogenetic tree showing distribution of members of the novelWenjunliagenus. (PDF 228KB)Supplementary File 37. Phylogenetic tree.PDF file showing collapsed phylogenetic tree showing distribution of members of the novelStreptantibioticusgenus. (PDF 228KB)

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3