PIM-Tree-Reference-Cited by-同舟云学术

PIM-Tree

Published:2022-12 Issue:4 Volume:16 Page:946-958
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Kang Hongbo¹,Zhao Yiwei²,Blelloch Guy E.²,Dhulipala Laxman³,Gu Yan⁴,McGuffey Charles⁵,Gibbons Phillip B.²

Affiliation:

1. Tsinghua University

2. Carnegie Mellon University

3. University of Maryland

4. UC Riverside

5. Reed College

Abstract

The performance of today's in-memory indexes is bottlenecked by the memory latency/bandwidth wall. Processing-in-memory (PIM) is an emerging approach that potentially mitigates this bottleneck, by enabling low-latency memory access whose aggregate memory bandwidth scales with the number of PIM nodes. There is an inherent tension, however, between minimizing inter-node communication and achieving load balance in PIM systems, in the presence of workload skew. This paper presents PIM-tree , an ordered index for PIM systems that achieves both low communication and high load balance, regardless of the degree of skew in data and queries. Our skew-resistant index is based on a novel division of labor between the host CPU and PIM nodes, which leverages the strengths of each. We introduce push-pull search , which dynamically decides whether to push queries to a PIM-tree node or pull the node's keys back to the CPU based on workload skew. Combined with other PIM-friendly optimizations ( shadow subtrees and chunked skip lists ), our PIM-tree provides high-throughput, (guaranteed) low communication, and (guaranteed) high load balance, for batches of point queries, updates, and range scans. We implement PIM-tree, in addition to prior proposed PIM indexes, on the latest PIM system from UPMEM, with 32 CPU cores and 2048 PIM nodes. On workloads with 500 million keys and batches of 1 million queries, the throughput using PIM-trees is up to 69.7X and 59.1x higher than the two best prior PIM-based methods. As far as we know these are the first implementations of an ordered index on a real PIM system.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3574245.3574275

Reference36 articles.

1. Junwhan Ahn , Sungpack Hong , Sungjoo Yoo , Onur Mutlu , and Kiyoung Choi . 2015 . A scalable processing-in-memory accelerator for parallel graph processing . In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 105--117 . 10.1145/2749469.2750386 Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 105--117. 10.1145/2749469.2750386

2. Shaahin Angizi , Naima Ahmed Fahmi , Wei Zhang , and Deliang Fan . 2020 . PIM-Assembler: A Processing-in-Memory Platform for Genome Assembly. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1--6. 10 .1109/DAC18072.2020.9218653 Shaahin Angizi, Naima Ahmed Fahmi, Wei Zhang, and Deliang Fan. 2020. PIM-Assembler: A Processing-in-Memory Platform for Genome Assembly. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1--6. 10.1109/DAC18072.2020.9218653

3. Shaahin Angizi , Zhezhi He , Adnan Siraj Rakin , and Deliang Fan . 2018 . CMP-PIM: An Energy-Efficient Comparator-Based Processing-in-Memory Neural Network Accelerator . In Proceedings of the 55th Annual Design Automation Conference ( San Francisco, California) (DAC '18). Association for Computing Machinery, New York, NY, USA, Article 105, 6 pages. 10.1145/3 195970.3196009 Shaahin Angizi, Zhezhi He, Adnan Siraj Rakin, and Deliang Fan. 2018. CMP-PIM: An Energy-Efficient Comparator-Based Processing-in-Memory Neural Network Accelerator. In Proceedings of the 55th Annual Design Automation Conference (San Francisco, California) (DAC '18). Association for Computing Machinery, New York, NY, USA, Article 105, 6 pages. 10.1145/3195970.3196009

4. Maya Arbel-Raviv , Trevor Brown , and Adam Morrison . 2018 . Getting to the Root of Concurrent Binary Search Tree Performance . In 2018 USENIX Annual Technical Conference, USENIX ATC 2018 , Boston, MA, USA , July 11-13, 2018, Haryadi S. Gunawi and Benjamin Reed (Eds.). USENIX Association, 295--306. https://www.usenix.org/conference/atc18/presentation/arbel-raviv Maya Arbel-Raviv, Trevor Brown, and Adam Morrison. 2018. Getting to the Root of Concurrent Binary Search Tree Performance. In 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11-13, 2018, Haryadi S. Gunawi and Benjamin Reed (Eds.). USENIX Association, 295--306. https://www.usenix.org/conference/atc18/presentation/arbel-raviv

5. On Weighted Balls-into-bins Games;Berenbrink Petra;Theoretical Computer Science,2008

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. RADAR: A Skew-Resistant and Hotness-Aware Ordered Index Design for Processing-in-Memory Systems;IEEE Transactions on Parallel and Distributed Systems;2024-09

2. NDPBridge: Enabling Cross-Bank Coordination in Near-DRAM-Bank Processing Architectures;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

3. UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

4. PimPam: Efficient Graph Pattern Matching on Real Processing-in-Memory Hardware;Proceedings of the ACM on Management of Data;2024-05-29

5. Energy Efficiency Impact of Processing in Memory: A Comprehensive Review of Workloads on the UPMEM Architecture;Lecture Notes in Computer Science;2024