Abstract
AbstractMetagenomics facilitates the study of the genetic information from uncultured microbes and complex microbial communities. Assembling complete microbial genomes (i.e., circular with no misassemblies) from metagenomics data is difficult because most samples have high organismal complexity and strain diversity. Less than 100 circularized bacterial and archaeal genomes have been assembled from metagenomics data despite the thousands of datasets that are available. Circularized genomes are important for (1) building a reference collection as scaffolds for future assemblies, (2) providing complete gene content of a genome, (3) confirming little or no contamination of a genome, (4) studying the genomic context and synteny of genes, and (5) linking protein coding genes to ribosomal RNA genes to aid metabolic inference in 16S rRNA gene sequencing studies. We developed a method to achieve circularized genomes using iterative assembly, binning, and read mapping. In addition, this method exposes potential misassemblies from k-mer based assemblies. We chose species of the Candidate Phyla Radiation (CPR) to focus our initial efforts because they have small genomes and are only known to have one ribosomal RNA operon. We present 34 circular CPR genomes, one circular Margulisbacteria genome, and two circular megaphage genomes from 19 public and published datasets. We demonstrate findings that would likely be difficult without circularizing genomes, including that ribosomal genes are likely not operonic in the majority of CPR, and that some CPR harbor diverged forms of RNase P RNA. Code and a tutorial for this method is available at https://github.com/lmlui/Jorg.
Publisher
Cold Spring Harbor Laboratory