DianNao-Reference-Cited by-同舟云学术

DianNao

Published:2014-04-05 Issue:1 Volume:42 Page:269-284
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Chen Tianshi¹,Du Zidong¹,Sun Ninghui¹,Wang Jia¹,Wu Chengyong¹,Chen Yunji¹,Temam Olivier²

Affiliation:

1. ICT, Beijing, China

2. Inria, Saclay, France

Abstract

Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm2 and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65 nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2654822.2541967

Reference44 articles.

1. Low-power, high-performance analog neural branch prediction

2. The PARSEC benchmark suite

3. A dynamically configurable coprocessor for convolutional neural networks

4. BenchNN: On the broad potential application scope of hardware neural network accelerators

Cited by 187 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ISSA: Architecting CNN Accelerators Using Input-Skippable, Set-Associative Computing-in-Memory;IEEE Transactions on Computers;2024-09

2. EdgePro: Edge Deep Learning Model Protection via Neuron Authorization;IEEE Transactions on Dependable and Secure Computing;2024-09

3. Energy-Efficient Computing Acceleration of Unmanned Aerial Vehicles Based on a CPU/FPGA/NPU Heterogeneous System;IEEE Internet of Things Journal;2024-08-15

4. ALPRI-FI: A Framework for Early Assessment of Hardware Fault Resiliency of DNN Accelerators;Electronics;2024-08-15

5. Scratchpad Memory Management for Deep Learning Accelerators;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12