Abstract
This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel’s Movidius Myriad X VPU processor, and quantisation on Xilinx’s programmable Z7020 FPGA hardware. Evolving models with NEMOKD increases inference accuracy by up to 82% at the cost of 38% increased latency, with throughput performance of 100–590 image frames-per-second (FPS). Quantisation identifies a sweet spot of 3 bit precision in the trade-off between latency, hardware requirements, training time and accuracy. Parallelising FPGA implementations of 2 and 3 bit quantised neural networks increases throughput from 6 k FPS to 373 k FPS, a 62× speedup.
Funder
Engineering and Physical Sciences Research Council
National Cyber Security Center, UK
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference48 articles.
1. Creating an AI Can Be Five Times Worse for the Planet Than a Car. New Scientisthttps://www.newscientist.com/article/2205779-creating-an-ai-can-be-five-times-worse-for-the-planet-than-a-car
2. Intel® Movidius™ Vision Processing Units (VPUs)https://www.intel.com/content/www/us/en/products/processors/movidius-vpu.html
3. Edge TPU: Google’s Purpose-Built ASIC Designed to Run Inference at the Edgehttps://cloud.google.com/edge-tpu
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献