Affiliation:
1. University of Virginia, USA
2. University of California San Diego, USA
Abstract
This work seeks to leverage Processing-with-storage-technology (PWST) to accelerate a key bioinformatics kernel called
k
-mer counting, which involves processing large files of sequence data on the disk to build a histogram of fixed-size genome sequence substrings and thereby entails prohibitively high I/O overhead. In particular, this work proposes a set of accelerator designs called Abakus that offer varying degrees of tradeoffs in terms of performance, efficiency, and hardware implementation complexity. The key to these designs is a set of domain-specific hardware extensions to accelerate the key operations for
k
-mer counting at various levels of the SSD hierarchy, with the goal of enhancing the limited computing capabilities of conventional SSDs, while exploiting the parallelism of the multi-channel, multi-way SSDs. Our evaluation suggests that Abakus can achieve 8.42×, 6.91×, and 2.32× speedup over the CPU-, GPU-, and near-data processing solutions.
Funder
CRISP
Semiconductor Research Corporation
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference81 articles.
1. De novo Genome Assembly from Next-Generation Sequencing (NGS) Reads
2. 2023. Crate seahash. Retrieved from https://docs.rs/seahash/latest/seahash/
3. National Human Genome Research Institute. 2023. DNA Sequencing Costs: Data. Retrieved from https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data
4. National Center for Biotechnology Information. 2023. NCB. U.S. National Library of Medicine. Retrieved from https://www.ncbi.nlm.nih.gov/sra
5. Xilinx. 2023. Samsung SmartSSD. Retrieved from https://www.xilinx.com/applications/data-center/computational-storage/smartssd.html