Cross-Feature Transfer Learning for Efficient Tensor Program Generation-Reference-Cited by-同舟云学术

Cross-Feature Transfer Learning for Efficient Tensor Program Generation

Published:2024-01-06 Issue:2 Volume:14 Page:513
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Verma Gaurav¹^ORCID,Raskar Siddhisanket²^ORCID,Emani Murali²^ORCID,Chapman Barbara¹^ORCID

Affiliation:

1. Deparment of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA

2. Argonne National Laboratory, Lemont, IL 60439, USA

Abstract

Tuning tensor program generation involves navigating a vast search space to find optimal program transformations and measurements for a program on the target hardware. The complexity of this process is further amplified by the exponential combinations of transformations, especially in heterogeneous environments. This research addresses these challenges by introducing a novel approach that learns the joint neural network and hardware features space, facilitating knowledge transfer to new, unseen target hardware. A comprehensive analysis is conducted on the existing state-of-the-art dataset, TenSet, including a thorough examination of test split strategies and the proposal of methodologies for dataset pruning. Leveraging an attention-inspired technique, we tailor the tuning of tensor programs to embed both neural network and hardware-specific features. Notably, our approach substantially reduces the dataset size by up to 53% compared to the baseline without compromising Pairwise Comparison Accuracy (PCA). Furthermore, our proposed methodology demonstrates competitive or improved mean inference times with only 25–40% of the baseline tuning time across various networks and target hardware. The attention-based tuner can effectively utilize schedules learned from previous hardware program measurements to optimize tensor program tuning on previously unseen hardware, achieving a top-5 accuracy exceeding 90%. This research introduces a significant advancement in autotuning tensor program generation, addressing the complexities associated with heterogeneous environments and showcasing promising results regarding efficiency and accuracy.

Funder

Stony Brook Research Computing and Cyberinfrastructure

Argonne Leadership Computing Facility

Exascale Computing Project

National Science Foundation

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/2/513/pdf

Reference50 articles.

1. Sabne, A. (2023, November 30). XLA: Compiling Machine Learning for Peak Performance. Available online: https://www.tensorflow.org/xla.

2. Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., Cowan, M., Wang, L., Hu, Y., and Ceze, L. (2018, January 8–10). {TVM}: An automated {End-to-End} optimizing compiler for deep learning. Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), Carlsbad, CA, USA.

3. Rotem, N., Fix, J., Abdulrasool, S., Catron, G., Deng, S., Dzhabarov, R., Gibson, N., Hegeman, J., Lele, M., and Levenstein, R. (2018). Glow: Graph lowering compiler techniques for neural networks. arXiv.

4. The Tensor Algebra Compiler;Kjolstad;Proc. ACM Program. Lang.,2017

5. The deep learning compiler: A comprehensive survey;Li;IEEE Trans. Parallel Distrib. Syst.,2020

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multistage transfer learning for medical images;Artificial Intelligence Review;2024-08-06

2. Mulberry fruit powder -Enriched bread：Development，Antioxidant and enzyme activity inhibition properties;2024-07-04

3. Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN Deployment;Proceedings of the 38th ACM International Conference on Supercomputing;2024-05-30