PIM-Tree

Author:

Kang Hongbo1,Zhao Yiwei2,Blelloch Guy E.2,Dhulipala Laxman3,Gu Yan4,McGuffey Charles5,Gibbons Phillip B.2

Affiliation:

1. Tsinghua University

2. Carnegie Mellon University

3. University of Maryland

4. UC Riverside

5. Reed College

Abstract

The performance of today's in-memory indexes is bottlenecked by the memory latency/bandwidth wall. Processing-in-memory (PIM) is an emerging approach that potentially mitigates this bottleneck, by enabling low-latency memory access whose aggregate memory bandwidth scales with the number of PIM nodes. There is an inherent tension, however, between minimizing inter-node communication and achieving load balance in PIM systems, in the presence of workload skew. This paper presents PIM-tree , an ordered index for PIM systems that achieves both low communication and high load balance, regardless of the degree of skew in data and queries. Our skew-resistant index is based on a novel division of labor between the host CPU and PIM nodes, which leverages the strengths of each. We introduce push-pull search , which dynamically decides whether to push queries to a PIM-tree node or pull the node's keys back to the CPU based on workload skew. Combined with other PIM-friendly optimizations ( shadow subtrees and chunked skip lists ), our PIM-tree provides high-throughput, (guaranteed) low communication, and (guaranteed) high load balance, for batches of point queries, updates, and range scans. We implement PIM-tree, in addition to prior proposed PIM indexes, on the latest PIM system from UPMEM, with 32 CPU cores and 2048 PIM nodes. On workloads with 500 million keys and batches of 1 million queries, the throughput using PIM-trees is up to 69.7X and 59.1x higher than the two best prior PIM-based methods. As far as we know these are the first implementations of an ordered index on a real PIM system.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Reference36 articles.

1. Junwhan Ahn , Sungpack Hong , Sungjoo Yoo , Onur Mutlu , and Kiyoung Choi . 2015 . A scalable processing-in-memory accelerator for parallel graph processing . In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 105--117 . 10.1145/2749469.2750386 Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 105--117. 10.1145/2749469.2750386

2. Shaahin Angizi , Naima Ahmed Fahmi , Wei Zhang , and Deliang Fan . 2020 . PIM-Assembler: A Processing-in-Memory Platform for Genome Assembly. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1--6. 10 .1109/DAC18072.2020.9218653 Shaahin Angizi, Naima Ahmed Fahmi, Wei Zhang, and Deliang Fan. 2020. PIM-Assembler: A Processing-in-Memory Platform for Genome Assembly. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1--6. 10.1109/DAC18072.2020.9218653

3. Shaahin Angizi , Zhezhi He , Adnan Siraj Rakin , and Deliang Fan . 2018 . CMP-PIM: An Energy-Efficient Comparator-Based Processing-in-Memory Neural Network Accelerator . In Proceedings of the 55th Annual Design Automation Conference ( San Francisco, California) (DAC '18). Association for Computing Machinery, New York, NY, USA, Article 105, 6 pages. 10.1145/3 195970.3196009 Shaahin Angizi, Zhezhi He, Adnan Siraj Rakin, and Deliang Fan. 2018. CMP-PIM: An Energy-Efficient Comparator-Based Processing-in-Memory Neural Network Accelerator. In Proceedings of the 55th Annual Design Automation Conference (San Francisco, California) (DAC '18). Association for Computing Machinery, New York, NY, USA, Article 105, 6 pages. 10.1145/3195970.3196009

4. Maya Arbel-Raviv , Trevor Brown , and Adam Morrison . 2018 . Getting to the Root of Concurrent Binary Search Tree Performance . In 2018 USENIX Annual Technical Conference, USENIX ATC 2018 , Boston, MA, USA , July 11-13, 2018, Haryadi S. Gunawi and Benjamin Reed (Eds.). USENIX Association, 295--306. https://www.usenix.org/conference/atc18/presentation/arbel-raviv Maya Arbel-Raviv, Trevor Brown, and Adam Morrison. 2018. Getting to the Root of Concurrent Binary Search Tree Performance. In 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11-13, 2018, Haryadi S. Gunawi and Benjamin Reed (Eds.). USENIX Association, 295--306. https://www.usenix.org/conference/atc18/presentation/arbel-raviv

5. On Weighted Balls-into-bins Games;Berenbrink Petra;Theoretical Computer Science,2008

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. RADAR: A Skew-Resistant and Hotness-Aware Ordered Index Design for Processing-in-Memory Systems;IEEE Transactions on Parallel and Distributed Systems;2024-09

2. NDPBridge: Enabling Cross-Bank Coordination in Near-DRAM-Bank Processing Architectures;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

3. UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

4. PimPam: Efficient Graph Pattern Matching on Real Processing-in-Memory Hardware;Proceedings of the ACM on Management of Data;2024-05-29

5. Energy Efficiency Impact of Processing in Memory: A Comprehensive Review of Workloads on the UPMEM Architecture;Lecture Notes in Computer Science;2024

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3