Fusing In-storage and Near-storage Acceleration of Convolutional Neural Networks-Reference-Cited by-同舟云学术

Fusing In-storage and Near-storage Acceleration of Convolutional Neural Networks

Published:2023-11-14 Issue:1 Volume:20 Page:1-22
ISSN:1550-4832
Container-title:ACM Journal on Emerging Technologies in Computing Systems
language:en
Short-container-title:J. Emerg. Technol. Comput. Syst.

Author:

Okafor Ikenna¹^ORCID,Ramanathan Akshay Krishna¹^ORCID,Challapalle Nagadastagiri Reddy¹^ORCID,Li Zheyu¹^ORCID,Narayanan Vijaykrishnan¹^ORCID

Affiliation:

1. The Pennsylvania State University, Dept of Electrical Eng and Comp Sci, USA

Abstract

Video analytics has a wide range of applications and has attracted much interest over the years. While it can be both computationally and energy-intensive, video analytics can greatly benefit from in/near memory compute. The practice of moving compute closer to memory has continued to show improvements to performance and energy consumption and is seeing increasing adoption. Recent advancements in solid state drives (SSDs) have incorporated near memory Field Programmable Gate Arrays (FPGAs) with shared access to the drive’s storage cells. These near memory FPGAs are capable of running operations required by video analytic pipelines such as object detection and template matching. These operations are typically executed using Convolutional Neural Networks (CNNs). A CNN is composed of multiple individually processed layers that perform various image processing tasks. Due to lack of resources, a layer may be partitioned into more manageable sub-layers. These sub-layers are then processed sequentially, however, some sub-layers can be processed simultaneously. Moreover, the storage cells within FPGA equipped SSDs are capable of being augmented with in-storage compute to accelerate CNN workloads and exploit the intra-parallelism within a CNN layer. To this end, we present our work, which leverages heterogeneous architectures to create an in/near-storage acceleration solution for video analytics. We designed a NAND flash accelerator and an FPGA accelerator, then mapped and evaluated several CNN benchmarks. We show how to utilize FPGAs, local DRAMs, and in-memory SSD compute to accelerate CNN workloads. Our work also demonstrates how to remove unnecessary memory transfers to save latency and energy.

Publisher

Association for Computing Machinery (ACM)

Subject

Electrical and Electronic Engineering,Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3597496

Reference45 articles.

1. Compute Caches

2. Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-layer CNN accelerators. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, 1–12.

3. SungHa Baek, Youngdon Jung, Aziz Mohaisen, Sungjin Lee, and DaeHun Nyang. 2018. SSD-Insider: Internal defense of solid-state drive against ransomware with perfect data recovery. In IEEE 38th International Conference on Distributed Computing Systems (ICDCS’18). 875–884. DOI:10.1109/ICDCS.2018.00089

4. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks;Chen Yu-Hsin;IEEE J. Solid-state Circ.,2016

5. Ming Cheng, Lixue Xia, Zhenhua Zhu, Yi Cai, Yuan Xie, Yu Wang, and Huazhong Yang. 2017. TIME: A training-in-memory architecture for memristor-based deep neural networks. In 54th ACM/EDAC/IEEE Design Automation Conference (DAC’17). IEEE, 1–6.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dataflow optimization with layer-wise design variables estimation method for enflame CNN accelerators;Journal of Parallel and Distributed Computing;2024-07