Abstract
AbstractAn important assessment prior to genome assembly and related analyses is genome profiling, where the k-mer frequencies within raw sequencing reads are analyzed to estimate major genome characteristics such as genome size, heterozygosity, and repetitiveness. Here we introduce GenomeScope 2.0 (https://github.com/tbenavi1/genomescope2.0), which applies combinatorial theory to establish a detailed mathematical model of how k-mer frequencies are distributed in heterozygous and polyploid genomes. We describe and evaluate a practical implementation of the polyploid-aware mixture model that, within seconds, accurately infers genome properties across thousands of simulated and eleven real datasets spanning a broad range of complexity. We also present a new method called Smudgeplots (https://github.com/KamilSJaron/smudgeplot) to visualize and infer the ploidy and genome structure of a genome by analyzing heterozygous k-mer pairs. We successfully apply the approach to systems of known variable ploidy levels in theMeloidogynegenus and also the extreme case of octoploidFragaria x ananassa.
Publisher
Cold Spring Harbor Laboratory
Reference35 articles.
1. “Genome sequence of the metazoan plant-parasitic nematode Meloidogyne incognita”;Nat. Biotechnol,2008
2. “The “Polyploid Hop”: Shifting Challenges and Opportunities Over the Evolutionary Lifespan of Genome Duplications”;Frontiers in Ecology and Evolution,2018
3. SNP genotyping and parameter estimation in polyploids using low-coverage sequencing data;Bioinformatics,2018
4. Informed and automated k-mer size selection for genome assembly;Bioinformatics,2014
5. Phased diploid genome assembly with single-molecule real-time sequencing;Nat. Methods,2016