Abstract
AbstractMotivationMapping reads to a reference genome is often the first step in a sequencing data analysis pipeline. Mistakes made at this computationally challenging stage cannot be recovered easily.ResultsWe present Whisper, an accurate and high-performant mapping tool, based on the idea of sorting reads and then mapping them against suffix arrays for the reference genome and its reverse complement. Employing task and data parallelism as well as storing temporary data on disk result in superior time efficiency at reasonable memory requirements. Whisper excels at large NGS read collections, in particular Illumina reads with typical WGS coverage. The experiments with real data indicate that our solution works in about 15% of the time needed by the well-known Bowtie2 and BWA-MEM tools at a comparable accuracy (validated in variant calling pipeline).AvailabilityWhisper is available for free from https://github.com/refresh-bio/Whisper or http://sun.aei.polsl.pl/REFRESH/Whisper/Contactsebastian.deorowicz@polsl.plSupplementary informationSupplementary data are available at publisher Web site.
Publisher
Cold Spring Harbor Laboratory
Reference44 articles.
1. From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline;Current Protocols in Bioinformatics,2013
2. A hybrid short read mapping accelerator;BMC Bioinformatics,2013
3. BitMapper: an efficient all-mapper based on bit-vector computing;BMC Bioinformatics,2015
4. Indexes of Large Genome Collections on a PC
5. SHRiMP2: Sensitive yet Practical Short Read Mapping