Affiliation:
1. Massachusetts Institute of Technology
2. Quanta Research Cambridge
Abstract
Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data and daily twitter feeds where the datasets of interest are 5TB to 20 TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GBs of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. In this paper we present BlueDBM, a new system architecture which has flash-based storage with in-store processing capability and a low-latency high-throughput inter-controller network. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a ram-cloud system falls sharply even if only 5%~10% of the references are to the secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost-performance trade-off for Big Data analytics.
Publisher
Association for Computing Machinery (ACM)
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29
2. Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02
3. BeaconGNN: Large-Scale GNN Acceleration with Out-of-Order Streaming In-Storage Computing;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02
4. Barad-dur: Near-Storage Accelerator for Training Large Graph Neural Networks;2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT);2023-10-21
5. Leveraging Computational Storage for Power-Efficient Distributed Data Analytics;ACM Transactions on Embedded Computing Systems;2022-10-18