An Application-oblivious Memory Scheduling System for DNN Accelerators-Reference-Cited by-同舟云学术

An Application-oblivious Memory Scheduling System for DNN Accelerators

Published:2022-09-16 Issue:4 Volume:19 Page:1-26
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Li Jiansong¹^ORCID,Wang Xueying²^ORCID,Chen Xiaobing²^ORCID,Li Guangli²^ORCID,Dong Xiao³^ORCID,Zhao Peng⁴^ORCID,Yu Xianzhi⁴^ORCID,Yang Yongxin²^ORCID,Cao Wei²^ORCID,Liu Lei²^ORCID,Feng Xiaobing²^ORCID

Affiliation:

1. Huawei Galois Lab, Beijing, China

2. Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences, Shijingshan District, Beijing, China

3. NVIDIA Corporation, Shanghai, China

4. Huawei 2012 Lab, Beijing, China

Abstract

Deep Neural Networks (DNNs) tend to go deeper and wider, which poses a significant challenge to the training of DNNs, due to the limited memory capacity of DNN accelerators. Existing solutions for memory-efficient DNN training are densely coupled with the application features of DNN workloads, e.g., layer structures or computational graphs of DNNs are necessary for these solutions. This would result in weak versatility for DNNs with sophisticated layer structures or complicated computation graphs. These schemes usually need to be re-implemented or re-adapted due to the new layer structures or the unusual operators in the computational graphs introduced by these DNNs. In this article, we review the memory pressure issues of DNN training from the perspective of runtime systems and model the memory access behaviors of DNN workloads. We identify the iterative, regularity , and extremalization properties of memory access patterns for DNN workloads. Based on these observations, we propose AppObMem, an application-oblivious memory scheduling system. AppObMem automatically traces the memory behaviors of DNN workloads and schedules the memory swapping to reduce the memory pressure of the device accelerators without the perception of high-level information of layer structures or computation graphs. Evaluations on a variety of DNN models show that, AppObMem obtains 40–60% memory savings with acceptable performance loss. AppObMem is also competitive with other open sourced SOTA schemes.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3535355

Reference71 articles.

1. Apache Singa Team.2020. Singa: A Distributed Deep Learning Library. Retrieved September 20 2020 from http://singa.apache.org/.

2. Javier Artiles and Satoshi Sekine. 2008. Tagged and Cleaned Wikipedia (TC Wikipedia) and Its Ngram. Retrieved September 20 2020 from https://nlp.cs.nyu.edu/wikipedia-data/.

3. Olivier Beaumont Lionel Eyraud-Dubois and Alena Shilova. 2019. Optimal GPU-CPU Offloading Strategies for Deep Neural Network Training. Retrieved from https://hal.inria.fr/hal-02316266.

4. Giuseppe Bonaccorso. 2017. Machine Learning Algorithms: A Reference Guide to Popular Algorithms for Data Science and Machine Learning. Packt Publishing.

5. Cambricon. 2020. Cambricon BANG C Developer Guide. Retrieved September 20 2020 from http://www.cambricon.com/docs/bangc/developer_guide_html/.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Smart-DNN+: A Memory-efficient Neural Networks Compression Framework for the Model Inference;ACM Transactions on Architecture and Code Optimization;2023-10-26