Affiliation:
1. North Carolina State University
Abstract
This article describes a scalable, configurable and cluster-based hierarchical hardware accelerator through custom hardware architecture for Sparsey, a cortical learning algorithm. Sparsey is inspired by the operation of the human cortex and uses a Sparse Distributed Representation to enable unsupervised learning and inference in the same algorithm. A distributed on-chip memory organization is designed and implemented in custom hardware to improve memory bandwidth and accelerate the memory read/write operations for synaptic weight matrices. Bit-level data are processed from distributed on-chip memory and custom multiply-accumulate hardware is implemented for binary and fixed-point multiply-accumulation operations. The fixed-point arithmetic and fixed-point storage are also adapted in this implementation. At 16 nm, the custom hardware of Sparsey achieved an overall 24.39× speedup, 353.12× energy efficiency per frame, and 1.43× reduction in silicon area against a state-of-the-art GPU.
Funder
U.S. Defense Advanced Projects Agency (DARPA) and Air Force Research Labs
Publisher
Association for Computing Machinery (ACM)
Subject
Electrical and Electronic Engineering,Hardware and Architecture,Software
Reference30 articles.
1. Hierarchical temporal memory (HTM) system deployed as web service;Ahmad Subutai;US Patent,2014
2. Disease prediction by machine learning over big data from healthcare communities;Chen Min;IEEE Access,2017
3. Continuous online sequence learning with an unsupervised neural network model;Cui Yuwei;Neur. Comput.,2016