Abstract
The goal of many genome sequencing projects is to provide a complete representation of a target genome (or genomes) as underpinning data for further analyses. However, it can be problematic to identify which sequences in an assembly truly derive from the target genome(s) and which are derived from associated microbiome or contaminant organisms. We present BlobTools, a modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets. Using guanine+cytosine content of sequences, read coverage in sequencing libraries and taxonomy of sequence similarity matches, BlobTools can assist in primary partitioning of data, leading to improved assemblies, and screening of final assemblies for potential contaminants. Through simulated paired-end read dataset,s containing a mixture of metazoan and bacterial taxa, we illustrate the main BlobTools workflow and suggest useful parameters for taxonomic partitioning of low-complexity metagenome assemblies.
Funder
Edinburgh University School of Biological Sciences
Biotechnology and Biological Sciences Research Council
James Hutton Institute
Subject
General Pharmacology, Toxicology and Pharmaceutics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine
Cited by
659 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献