CHIP-KNNv2: A C onfigurable and Hi gh- P erformance K - N earest N eighbors Accelerator on HBM-based FPGAs

Author:

Liu Kenneth1ORCID,Lu Alec1ORCID,Samtani Kartik1ORCID,Fang Zhenman1ORCID,Guo Licheng2ORCID

Affiliation:

1. School of Engineering Science, Simon Fraser University, Canada

2. Computer Science Department, University of California, Los Angeles, United States

Abstract

The k-nearest neighbors (KNN) algorithm is an essential algorithm in many applications, such as similarity search, image classification, and database query. With the rapid growth in the dataset size and the feature dimension of each data point, processing KNN becomes more compute and memory hungry. Most prior studies focus on accelerating the computation of KNN using the abundant parallel resource on FPGAs. However, they often overlook the memory access optimizations on FPGA platforms and only achieve a marginal speedup over a multi-thread CPU implementation for large datasets. In this article, we design and implement CHIP-KNN: an HLS-based, configurable, and high-performance KNN accelerator. CHIP-KNN  optimizes the off-chip memory access on modern HBM-based FPGAs such as the AMD/Xilinx Alveo U280 FPGA board. CHIP-KNN is configurable for all essential parameters used in the algorithm, including the size of the search dataset, the feature dimension and data type representation of each data point, the distance metric, and the number of nearest neighbors - K. In terms of design architecture, we explore and discuss the tradeoffs between two design versions: CHIP-KNNv1 (Ping-Pong buffer based) and CHIP-KNNv2  (streaming-based). Moreover, we investigate the routing congestion issue in our accelerator design, implement hierarchical structures to shorten critical paths, and integrate an open-source floorplanning optimization tool called TAPA/AutoBridge to eliminate the place-and-route issues. To explore the design space and balance the computation and memory access performance, we also build an analytical performance model. Given a user configuration of the KNN parameters, our tool can automatically generate TAPA HLS C code for the optimal accelerator design and the corresponding host code, on the HBM-based FPGA platform. Our experimental results on the Alveo U280 show that, compared to a 48-thread CPU implementation, CHIP-KNNv2 achieves a geomean performance speedup of 15×, with a maximum speedup of 45×. Additionally, we show that CHIP-KNNv2 achieves up to 2.1× performance speedup over CHIP-KNNv1  while increasing configurability. Compared with the state-of-the-art Facebook AI Similarity Search (FAISS) [ 23 ] GPU implementation running on a Nvidia Tesla V100 GPU, CHIP-KNNv2 achieves an average latency reduction of 30.6× while requiring 34.3% of GPU power consumption.

Funder

NSERC Discovery

Canada Foundation for Innovation John R. Evans Leaders Fund and British Columbia Knowledge Development Fund

Simon Fraser University New Faculty Start-up

Huawei, Xilinx, and Nvidia

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference44 articles.

1. Accelerated Approximate Nearest Neighbors Search Through Hierarchical Product Quantization

2. An introduction to kernel and nearest-neighbor nonparametric regression;Altman N. S.;The American Statistician,1992

3. G. Aparício, I. Blanquer, and V. Hernández. 2007. A parallel implementation of the K nearest neighbors classifier in three levels: Threads, MPI processes and the grid. In Proceedings of the High Performance Computing for Computational Science.Michel Daydé, José M. L. M. Palma, Álvaro L. G. A. Coutinho, Esther Pacitti, and João Correia Lopes (Eds.), Springer, Berlin, 225–235.

4. Sunil Arya and David M. Mount. 1998. ANN: Library for approximate nearest neighbor searching. In Proceedings of the IEEE CGC Workshop on Computational Geometry. 33–40.

5. An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3