LCM: LLM-focused Hybrid SPM-cache Architecture with Cache Management for Multi-Core AI Accelerators-Reference-Cited by-同舟云学术

LCM: LLM-focused Hybrid SPM-cache Architecture with Cache Management for Multi-Core AI Accelerators

Published:2024-05-30 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 38th ACM International Conference on Supercomputing
language:
Short-container-title:

Author:

Lai Chengtao¹^ORCID,Zhou Zhongchun¹^ORCID,Poptani Akash²^ORCID,Zhang Wei³^ORCID

Affiliation:

1. The Hong Kong University of Science and Technology, Hong Kong

2. Indian Institute of Technology Dharwad, India

3. Hong Kong University of Science and Technology, Hong Kong Special Administrative Region of China

Funder

AI Chip Center for Emerging Smart Systems

Huawei Hong Kong Research Center

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3650200.3656592

Reference56 articles.

1. IATAC: a smart predictor to turn-off L2 cache lines

2. A Multi-Neural Network Acceleration Architecture

3. Sid Black Stella Biderman Eric Hallahan Quentin Anthony Leo Gao Laurence Golding Horace He Connor Leahy Kyle McDonell Jason Phang Michael Pieler USVSN Sai Prashanth Shivanshu Purohit Laria Reynolds Jonathan Tow Ben Wang and Samuel Weinbach. 2022. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. arxiv:2204.06745 [cs.CL]

4. M. Brehob and R. Enbody. 1999. An Analytical Model of Locality and Caching. Tech. Rep. MSU-CSE-99-31. Michigan State University, Department of Computer Science and Engineering.

5. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, 2018. { TVM} : An automated { End-to-End} optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578–594.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Fixed-Point Arithmetic Analysis for Development of LLaMA 3 On-Device Accelerator;JOURNAL OF BROADCAST ENGINEERING;2024-07-31