Abstract
AbstractBackgroundMetagenomic binning, the clustering of assembled contigs that belong to the same genome, is a crucial step for recovering metagenomeassembled genomes (MAGs). Contigs are linked by exploiting consistent read coverage patterns across a genome. Using coverage from multiple samples leads to higher-quality MAGs; however, standard pipelines require all-to-all read alignments for multiple samples to compute coverage, becoming a key computational bottleneck.ResultsWe present fairy (https://github.com/bluenote-1577/fairy), anapproximatecoverage calculation method for metagenomic binning. Fairy is a fast k-mer-based alignment-free method. For multi-sample binning, fairy can be>250×faster than read alignment and accurate enough for binning. Fairy is compatible with several existing binners on host and non-host-associated datasets. Using MetaBAT2, fairy recovers98.5%of MAGs with>50%completeness and<5%incompleteness relative to alignment with BWA. Notably, multi-sample binning with fairy isalwaysbetter than single-sample binning using BWA (>1.5×more>50%complete MAGs on average) while still being faster. For a public sediment metagenome project, we demonstrate that multisample binning recovers higher quality Asgard archaea MAGs than single-sample binning and that fairy’s results are indistinguishable from read alignment.ConclusionsFairy is a new tool for approximately and quickly calculating multi-sample coverage for binning, resolving a longstanding computational bottleneck for metagenomics.
Publisher
Cold Spring Harbor Laboratory