Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA

Author:

Liu Shuanglong1ORCID,Fan Hongxiang1,Niu Xinyu1,Ng Ho-cheung1,Chu Yang1,LUK Wayne1

Affiliation:

1. Imperial College London, London, UK

Abstract

Convolutional Neural Networks-- (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art CNNs for end-to-end training and models to support tasks such as image segmentation and super resolution. However, the deconvolution algorithms are computationally intensive, which limits their applicability to real-time applications. Particularly, there has been little research on the efficient implementations of deconvolution algorithms on FPGA platforms that have been widely used to accelerate CNN algorithms by practitioners and researchers due to their high performance and power efficiency. In this work, we propose and develop deconvolution architecture for efficient FPGA implementation. FPGA-based accelerators are proposed for both deconvolution and CNN algorithms. Besides, memory sharing between the computation modules is proposed for the FPGA-based CNN accelerator as well as for other optimization techniques. A non-linear optimization model based on the performance model is introduced to efficiently explore the design space to achieve optimal processing speed of the system and improve power efficiency. Furthermore, a hardware mapping framework is developed to automatically generate the low-latency hardware design for any given CNN model on the target device. Finally, we implement our designs on Xilinx Zynq ZC706 board and the deconvolution accelerator achieves a performance of 90.1 giga operations per second (GOPS) under 200MHz working frequency and a performance density of 0.10 GOPS/DSP using 32-bit quantization, which significantly outperforms previous designs on FPGAs. A real-time application of scene segmentation on Cityscapes Dataset is used to evaluate our CNN accelerator on Zynq ZC706 board, and the system achieves a performance of 107 GOPS and 0.12 GOPS/DSP using 16-bit quantization and supports up to 17 frames per second for 512 × 512 image inputs with a power consumption of only 9.6W.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference31 articles.

1. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

2. 2017. Semantic Understanding of Urban Street Scenes: Benchmark Suite. Retrieved from https://www.cityscapes-dataset.com/benchmarks/. 2017. Semantic Understanding of Urban Street Scenes: Benchmark Suite. Retrieved from https://www.cityscapes-dataset.com/benchmarks/.

3. The Cityscapes Dataset for Semantic Urban Scene Understanding

4. Vincent Dumoulin and Francesco Visin. 2016. A guide to convolution arithmetic for deep learning. arXiv:1603.07285. https://arxiv.org/abs/1603.07285. Vincent Dumoulin and Francesco Visin. 2016. A guide to convolution arithmetic for deep learning. arXiv:1603.07285. https://arxiv.org/abs/1603.07285.

5. ESE

Cited by 44 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Conglomeration of Deep Neural Network and Quantum Learning for Object Detection: Status Quo Review;Knowledge-Based Systems;2024-02

2. Analysis of Hardware-Implemented U-Net–Like Convolutional Neural Networks;Communications in Computer and Information Science;2024

3. Design and Development of an Optimized Fast Transformation Module (OpFTM) for GAN Accelerator with Computation Efficiency;Lecture Notes in Electrical Engineering;2024

4. An Efficient Dataflow for Convolutional Generative Models;2023 International Conference on Field Programmable Technology (ICFPT);2023-12-12

5. Enhancing Image Segmentation Performance with MRAM based Processing-in-Memory Architecture;2023 IEEE Nanotechnology Materials and Devices Conference (NMDC);2023-10-22

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3