Abstract
AbstractMotivationSelecting a subset of k-mers in a string in a local manner is a common task in bioinformatics tools for speeding up computation. Arguably the most well-known and common method is the minimizer technique, which selects the ‘lowest-ordered’ k-mer in a sliding window. Recently, it has been shown that minimizers are a sub-optimal method for selecting subsets of k-mers when mutations are present. There is however a lack of understanding behind the theory of why certain methods perform well.ResultsWe first theoretically investigate the conservation metric for k-mer selection methods. We derive an exact expression for calculating the conservation of a k-mer selection method. This turns out to be tractable enough for us to prove closed-form expressions for a variety of methods, including (open and closed) syncmers, (α, b, n)-words, and an upper bound for minimizers. As a demonstration of our results, we modified the minimap2 read aligner to use a more optimal k-mer selection method and demonstrate that there is up to an 8.2% relative increase in number of mapped reads.Availability and supplementary informationSimulations and supplementary methods available at https://github.com/bluenote-1577/local-kmer-selection-results. os-minimap2 is a modified version of minimap2 and available at https://github.com/bluenote-1577/os-minimap2.Contactjshaw@math.toronto.edu
Publisher
Cold Spring Harbor Laboratory
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献