PuDianNao-Reference-Cited by-同舟云学术

PuDianNao

Published:2015-05-29 Issue:1 Volume:43 Page:369-381
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Liu Daofu¹,Chen Tianshi²,Liu Shaoli²,Zhou Jinhong³,Zhou Shengyuan²,Teman Olivier⁴,Feng Xiaobing²,Zhou Xuehai⁵,Chen Yunji⁶

Affiliation:

1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing , China

2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

3. University of Science and Technology of China, Hefei, China

4. Inria, Paris, China

5. University of Science and Technology of China, Heifei, China

6. Institute of Computing Technology, Chinese Academy of Sciences/CAS Center for Excellence in Brain Science, Beijing, China

Abstract

Machine Learning (ML) techniques are pervasive tools in various emerging commercial applications, but have to be accommodated by powerful computer systems to process very large data. Although general-purpose CPUs and GPUs have provided straightforward solutions, their energy-efficiencies are limited due to their excessive supports for flexibility. Hardware accelerators may achieve better energy-efficiencies, but each accelerator often accommodates only a single ML technique (family). According to the famous No-Free-Lunch theorem in the ML domain, however, an ML technique performs well on a dataset may perform poorly on another dataset, which implies that such accelerator may sometimes lead to poor learning accuracy. Even if regardless of the learning accuracy, such accelerator can still become inapplicable simply because the concrete ML task is altered, or the user chooses another ML technique. In this study, we present an ML accelerator called PuDianNao, which accommodates seven representative ML techniques, including k-means, k-nearest neighbors, naive bayes, support vector machine, linear regression, classification tree, and deep neural network. Benefited from our thorough analysis on computational primitives and locality properties of different ML techniques, PuDianNao can perform up to 1056 GOP/s (e.g., additions and multiplications) in an area of 3.51 mm^2, and consumes 596 mW only. Compared with the NVIDIA K20M GPU (28nm process), PuDianNao (65nm process) is 1.20x faster, and can reduce the energy by 128.41x.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2786763.2694358

Reference41 articles.

1. An introduction to kernel and nearest-neighbor nonparametric regression;Altman Naomi S;The American Statistician,1992

2. A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines

Cited by 44 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Energy-Efficient Computing Acceleration of Unmanned Aerial Vehicles Based on a CPU/FPGA/NPU Heterogeneous System;IEEE Internet of Things Journal;2024-08-15

2. NEOCNN: NTT-Enabled Optical Convolution Neural Network Accelerator;Proceedings of the 38th ACM International Conference on Supercomputing;2024-05-30

3. IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3;2024-04-27

4. Design Implementation of FPGA-Based Neural Network Acceleration;2024 4th International Conference on Consumer Electronics and Computer Engineering (ICCECE);2024-01-12

5. A review of in-memory computing for machine learning: architectures, options;International Journal of Web Information Systems;2023-12-22