The Next 700 Accelerated Layers-Reference-Cited by-同舟云学术

The Next 700 Accelerated Layers

Published:2019-12-31 Issue:4 Volume:16 Page:1-26
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Vasilache Nicolas¹,Zinenko Oleksandr²^ORCID,Theodoridis Theodoros³,Goyal Priya⁴,Devito Zachary⁵,Moses William S.⁶,Verdoolaege Sven⁷,Adams Andrew⁵,Cohen Albert⁸^ORCID

Affiliation:

1. Facebook AI Research, NY, USA

2. Inria and ENS, Paris, France

3. ETH Zürich, Zürich, Switzerland

4. Facebook AI Research, New York City, NY, USA

5. Facebook AI Research, Menlo Park, CA, USA

6. MIT CSAIL, Cambridge, MA, USA

7. Polly Labs 8 Facebook AI Research, Leuven, Belgium

8. Inria, ENS and Facebook AI Research, Paris, France

Abstract

Deep learning frameworks automate the deployment, distribution, synchronization, memory allocation, and hardware acceleration of models represented as graphs of computational operators. These operators wrap high-performance libraries such as cuDNN or NNPACK. When the computation does not match any predefined library call, custom operators must be implemented, often at high engineering cost and performance penalty, limiting the pace of innovation. To address this productivity gap, we propose and evaluate: (1) a domain-specific language with a tensor notation close to the mathematics of deep learning; (2) a Just-In-Time optimizing compiler based on the polyhedral framework; (3) carefully coordinated linear optimization and evolutionary algorithms to synthesize high-performance CUDA kernels; (4) the transparent integration of our flow into PyTorch and Caffe2, providing the fully automatic synthesis of high-performance GPU kernels from simple tensor algebra. The performance is comparable to, and often exceeds the performance of, highly tuned libraries.

Funder

Facebook to ETH Zürich

European Commission through the MNEMOSENE project

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3355606

Reference75 articles.

1. Using Machine Learning to Focus Iterative Optimization

2. Scanning polyhedra with DO loops

Cited by 29 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. (De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphisms;ACM Transactions on Programming Languages and Systems;2024-05-22

2. Retargeting and Respecializing GPU Workloads for Performance Portability;2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO);2024-03-02

3. Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine Relations;ACM Transactions on Computer Systems;2023-11-30

4. HAOTuner: A Hardware Adaptive Operator Auto-Tuner for Dynamic Shape Tensor Compilers;IEEE Transactions on Computers;2023-11

5. High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs;Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming;2023-02-21