Abstract
AbstractMotivationRecent advances in sequencing technologies have stressed the critical role of sequence analysis algorithms and tools in genomics and healthcare research. In particular, sequence alignment is a fundamental building block in many sequence analysis pipelines and is frequently a performance-critical bottleneck both in time and memory. Classical sequence alignment algorithms are based on dynamic programming and often require quadratic time and memory with respect to the sequence length. As a result, classic sequence alignment algorithms fail to scale with increasing sequence lengths and quickly become memory-bound due to data-movement penalties.ResultsProcessing-In-Memory (PIM) is an emerging architectural paradigm that seeks to accelerate memory-bound algorithms by bringing computation closer to the data to mitigate data-movement penalties. This work presents BIMSA (Bidirectional In-MemorySequenceAlignment), a PIM-optimized implementation of the state-of-the-art sequence alignment algorithm BiWFA (Bidirectional Wavefront Alignment), incorporating hardware-aware optimizations for a production-ready PIM architecture (UPMEM). BIMSA achieves speedups up to 22.24× (11.95× on average) compared to state-of-the-art PIM-enabled implementations of sequence alignment algorithms, and supports aligning sequences of thousands of bases, exceeding the limitations of current PIM-accelerated implementations. BIMSA also achieves speedups up to 5.84× (2.83× on average) compared to the most efficient multicore CPU implementation of BiWFA. Most notably, BIMSA exhibits linear scalability with the number of compute units, enabling further performance improvements with upcoming PIM architectures equipped with more compute units and achieving speedups up to 9.56× (4.7× on average).AvailabilityCode and documentation are publicly available athttps://github.com/AlejandroAMarin/BIMSA.Contactalejandro.alonso1@bsc.es
Publisher
Cold Spring Harbor Laboratory