Abstract
AbstractMotivationSketching methods provide scalable solutions for analyzing rapidly growing genomic data. A recent innovation in sketching methods, syncmers, has proven effective and has been employed for read alignment. Syncmers share fundamental features with the FracMinHash technique, a recent modification of the popular MinHash algorithm for set similarity estimation between sets of different sizes. While previous researchers have demonstrated the effectiveness of syncmers in read alignment, their potential for use in genomic analysis (for which FracMinHash was designed) has not been fully realized.ResultsWe demonstrate that the open syncmer sketch is equivalent to a FracMinHash sketch when applying tok-mer-based similarities, yet it exhibits superior distance distribution and genomic coverage. Moreover, we can expand the concept ofk-mer truncation to open syncmers, enabling multi-resolution estimation in metagenomics as well as flexible-sized seeding for sequence comparisons.ReproducibilityAll analysis scripts can be found onGitHub.
Publisher
Cold Spring Harbor Laboratory