Abstract
AbstractMotivationTechnical progress in computer hardware made it possible to access and process large amounts of data even on budget workstations. Therefore new or existing alignment algorithms may use large index files to increase performance. Spaced seeds with large weights reduce the number of possible locations of a read within a reference sequence. Optimal patterns for spaced seeds may guarantee to align reads even with several substitutions.ResultsFor reads of 64–200 bp periodic spaced seeds of 32, 40, 48, 56, 64 weights are found that guarantee to locate all positions within a reference sequence for a specified number of point mutations. SIMD instructions to convert masked reads into 64, 80, 96, 112, 128-bit numbers are provided.AvailabilityC codes to generate spaced seeds and find optimal SIMD instructions for them are freely available under MIT license at https://github.com/vtman/VSTseed
Publisher
Cold Spring Harbor Laboratory