PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM-Reference-Cited by-同舟云学术

PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM

Published:2022-11-17 Issue:1 Volume:20 Page:1-31
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Olgun Ataberk¹^ORCID,Luna Juan Gómez¹^ORCID,Kanellopoulos Konstantinos¹^ORCID,Salami Behzad²^ORCID,Hassan Hasan¹^ORCID,Ergin Oguz³^ORCID,Mutlu Onur¹^ORCID

Affiliation:

1. ETH Zurich, Gloriastrasse, Zurich, Switzerland

2. SAFARI Research Group, Zurich, Switzerland

3. TOBB University of Economics and Technology, Ankara, Turkey

Abstract

Commodity DRAM-based processing-using-memory (PuM) techniques that are supported by off-the-shelf DRAM chips present an opportunity for alleviating the data movement bottleneck at low cost. However, system integration of these techniques imposes non-trivial challenges that are yet to be solve d . Potential solutions to the integration challenges require appropriate tools to develop any necessary hardware and software components. Unfortunately, current proprietary computing systems, specialized DRAM-testing platforms, or system simulators do not provide the flexibility and/or the holistic system view that is necessary to properly evaluate and deal with the integration challenges of commodity DRAM-based PuM techniques. We design and develop Processing-in-DRAM (PiDRAM), the first flexible end-to-end framework that enables system integration studies and evaluation of real, commodity DRAM-based PuM techniques. PiDRAM provides software and hardware components to rapidly integrate PuM techniques across the whole system software and hardware stack. We implement PiDRAM on an FPGA-based RISC-V system. To demonstrate the flexibility and ease of use of PiDRAM, we implement and evaluate two state-of-the-art commodity DRAM-based PuM techniques: (i) in-DRAM copy and initialization (RowClone) and (ii) in-DRAM true random number generation (D-RaNGe) . We describe how we solve key integration challenges to make such techniques work and be effective on a real-system prototype, including memory allocation, alignment, and coherence. We observe that end-to-end RowClone speeds up bulk copy and initialization operations by 14.6× and 12.6×, respectively, over conventional CPU copy, even when coherence is supported with inefficient cache flush operations. Over PiDRAM’s extensible codebase, integrating both RowClone and D-RaNGe end-to-end on a real RISC-V system prototype takes only 388 lines of Verilog code and 643 lines of C++ code.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3563697

Reference177 articles.

1. Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute caches. In HPCA.

2. Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In ISCA.

3. Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture. In ISCA.

4. Berkin Akin, Franz Franchetti, and James C. Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In ISCA.

5. In-Memory Low-Cost Bit-Serial Addition Using Commodity DRAM Technology

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space;2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA);2024-06-29

2. Simultaneous Many-Row Activation in Off-the-Shelf DRAM Chips: Experimental Characterization and Analysis;2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN);2024-06-24

3. Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02

4. Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02

5. MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02