TeMCO: Tensor Memory Compiler Optimization across Tensor Decompositions in Deep Learning Inference-Reference-Cited by-同舟云学术

TeMCO: Tensor Memory Compiler Optimization across Tensor Decompositions in Deep Learning Inference

Published:2024-08-12 Issue: Volume:201 Page:1114-1123
ISSN:
Container-title:Proceedings of the 53rd International Conference on Parallel Processing
language:
Short-container-title:

Author:

Song Seungbin¹^ORCID,Lee Ju Min¹^ORCID,Jeong Haeeun¹^ORCID,Kwon Hyunho¹^ORCID,Jeong Shinnung¹^ORCID,Lee Jaeho¹^ORCID,Kim Hanjun¹^ORCID

Affiliation:

1. Yonsei University, Republic of Korea

Funder

Ministry of Science and ICT, South Korea

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3673038.3673048

Reference50 articles.

1. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265–283.

2. A. F. Agarap. 2018. Deep Learning using Rectified Linear Units (ReLU). CoRR (2018). arXiv:1803.08375

3. M. Alwani, H. Chen, M. Ferdman, and P. Milder. 2016. Fused-layer CNN accelerators. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–12.

4. A. Artemev, Y. An, T. Roeder, and M. van der Wilk. 2022. Memory safe computations with XLA compiler. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.).

5. J. Chen, L. Zheng, Z. Yao, D. Wang, I. Stoica, M. Mahoney, and J. Gonzalez. 2021. ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training. In Proceedings of the 38th International Conference on Machine Learning, M. Meila and T. Zhang (Eds.). 1803–1813.