Affiliation:
1. Federal University of Rio Grande do Sul, Rio Grande do Sul, Brazil
Abstract
Exploiting computational and data reuse in CNNs is crucial for the successful design of resource-constrained platforms. In image recognition applications, high levels of input locality and redundancy present in CNNs have become the golden goose for skipping costly arithmetic operations. One promising technique for this consists in storing function responses of some input patterns into offline lookup tables and replacing online computation with search operations, which are highly efficient when implemented by emerging non-volatile memory technologies. In this work, we rethink both algorithm and architecture for exploiting locality and reuse opportunities by replacing entire convolutions with searches on Content-addressable Memories. By previously calculating convolution results and building compact lookup tables with our novel clustering algorithm, one can evaluate activations at constant time complexity, also requiring a single read operation of the current input tensor. Then, we devise a reconfigurable array of processing elements based on memristive Ternary Content-addressable Memories to efficiently implement the algorithmic solution and meet the flexibility requirements of several CNN architectures. Results show that our design reduces the number of multiplications and memory accesses proportionally to the number of convolutional layer channels. The average performance is 1,172 and 82 FPS for AlexNet and VGG-16 models, thus outperforming state-of-the-art works by 13×.
Funder
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES)-Finance Code 001
National Council for Scientific and Technological Development
Publisher
Association for Computing Machinery (ACM)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献