Halide-Reference-Cited by-同舟云学术

Halide

Published:2013-06-23 Issue:6 Volume:48 Page:519-530
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Ragan-Kelley Jonathan¹,Barnes Connelly²,Adams Andrew¹,Paris Sylvain²,Durand Frédo¹,Amarasinghe Saman¹

Affiliation:

1. Massachusetts Institute of Technology, Cambridge, MA, USA

2. Adobe, Cambridge, MA, USA

Abstract

Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of magnitude. Efficient implementations require optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values. We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule. Combining this compiler with stochastic search over the space of schedules enables terse, composable programs to achieve state-of-the-art performance on a wide range of real image processing pipelines, and across different hardware architectures, including multicores with SIMD, and heterogeneous CPU+GPU execution. From simple Halide programs written in a few hours, we demonstrate performance up to 5x faster than hand-tuned C, intrinsics, and CUDA implementations optimized by experts over weeks or months, for image processing applications beyond the reach of past automatic compilers.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2499370.2462176

Reference30 articles.

1. The Frankencamera

2. PetaBricks

3. Brook for GPUs

4. Real-time edge-aware image processing with the bilateral grid

Cited by 657 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Abstractions for C++ code optimizations in parallel high-performance applications;Parallel Computing;2024-09

2. CG-Kit: Code Generation Toolkit for performant and maintainable variants of source code applied to Flash-X hydrodynamics simulations;Future Generation Computer Systems;2024-09

3. Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture;ACM Transactions on Architecture and Code Optimization;2024-08-21

4. FreeStencil: A Fine-Grained Solver Compiler with Graph and Kernel Optimizations on Structured Meshes for Modern GPUs;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12

5. A high-performance dataflow-centric optimization framework for deep learning inference on the edge;Journal of Systems Architecture;2024-07