Pin or Fuse? Exploiting Scratchpad Memory to Reduce Off-Chip Data Transfer in DNN Accelerators-Reference-Cited by-同舟云学术

Pin or Fuse? Exploiting Scratchpad Memory to Reduce Off-Chip Data Transfer in DNN Accelerators

Published:2023-02-17 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization
language:
Short-container-title:

Author:

Jeong Hyuk-Jin¹,Yeo JiHwan¹,Bahk Cheongyo¹,Park JongHyun¹

Affiliation:

1. Samsung Research, South Korea

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3579990.3580017

Reference45 articles.

1. Martín Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , and Michael Isard . 2016 . Tensorflow: A system for large-scale machine learning. In 12th $USENIX$ symposium on operating systems design and implementation ($OSDI$ 16). 265–283. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2016. Tensorflow: A system for large-scale machine learning. In 12th $USENIX$ symposium on operating systems design and implementation ($OSDI$ 16). 265–283.

2. Arun Abraham , Manas Sahni , and Akshay Parashar . 2019 . Efficient Memory Pool Allocation Algorithm for CNN Inference. In 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). 345–352 . Arun Abraham, Manas Sahni, and Akshay Parashar. 2019. Efficient Memory Pool Allocation Algorithm for CNN Inference. In 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). 345–352.

3. Phase-based Cache Locking for Embedded Systems

4. Fused-layer CNN accelerators

5. SIAM Journal on computing, 9, 4;Baker Brenda S,1980

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. gem5-NVDLA: A Simulation Framework for Compiling, Scheduling, and Architecture Evaluation on AI System-on-Chips;ACM Transactions on Design Automation of Electronic Systems;2024-09-04

2. Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs;The 53rd International Conference on Parallel Processing Workshops;2024-08-12

3. Optimizing code allocation for hybrid on-chip memory in IoT systems;Integration;2024-07

4. LCM: LLM-focused Hybrid SPM-cache Architecture with Cache Management for Multi-Core AI Accelerators;Proceedings of the 38th ACM International Conference on Supercomputing;2024-05-30

5. Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1;2024-04-17