Parallelizing the Chambolle Algorithm for Performance-Optimized Mapping on FPGA Devices

Author:

Beretta Ivan1,Rana Vincenzo2,Akin Abdulkadir1,Nacci Alessandro Antonio2,Sciuto Donatella2,Atienza David1

Affiliation:

1. École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

2. Politecnico di Milano, Milano

Abstract

The performance and the efficiency of recent computing platforms have been deeply influenced by the widespread adoption of hardware accelerators, such as graphics processing units (GPUs) or field-programmable gate arrays (FPGAs), which are often employed to support the tasks of general-purpose processors (GPPs). One of the main advantages of these accelerators over their sequential counterparts (GPPs) is their ability to perform massive parallel computation. However, to exploit this competitive edge, it is necessary to extract the parallelism from the target algorithm to be executed, which generally is a very challenging task. This concept is demonstrated, for instance, by the poor performance achieved on relevant multimedia algorithms, such as Chambolle, which is a well-known algorithm employed for the optical flow estimation. The implementations of this algorithm that can be found in the state of the art are generally based on GPUs but barely improve the performance that can be obtained with a powerful GPP. In this article, we propose a novel approach to extract the parallelism from computation-intensive multimedia algorithms, which includes an analysis of their dependency schema and an assessment of their data reuse. We then perform a thorough analysis of the Chambolle algorithm, providing a formal proof of its inner data dependencies and locality properties. Then, we exploit the considerations drawn from this analysis by proposing an architectural template that takes advantage of the fine-grained parallelism of FPGA devices. Moreover, since the proposed template can be instantiated with different parameters, we also propose a design metric, the expansion rate, to help the designer in the estimation of the efficiency and performance of the different instances, making it possible to select the right one before the implementation phase. We finally show, by means of experimental results, how the proposed analysis and parallelization approach leads to the design of efficient and high-performance FPGA-based implementations that are orders of magnitude faster than the state-of-the-art ones.

Funder

ONR-G

E4Bio RTD project

Swiss NSF

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Optical flow algorithms optimized for speed, energy and accuracy on embedded GPUs;Journal of Real-Time Image Processing;2023-03-14

2. Real-time and effective detection of agricultural pest using an improved YOLOv5 network;Journal of Real-Time Image Processing;2023-03-14

3. Defect Measurement in Welded Objects by Radiography Testing and Chambolle’s Image Processing Method;Journal of Nondestructive Evaluation;2021-05-28

4. DCMI;ACM Transactions on Architecture and Code Optimization;2019-12-31

5. A Comprehensive Framework for Synthesizing Stencil Algorithms on FPGAs using OpenCL Model;Proceedings of the 54th Annual Design Automation Conference 2017;2017-06-18

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3