Author:
Wu Yuting,Wang Ziyu,Lu Wei D.
Abstract
AbstractDecoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.
Funder
Division of Computing and Communication Foundations
Division of Electrical, Communications and Cyber Systems
Semiconductor Research Corporation
Publisher
Springer Science and Business Media LLC
Reference48 articles.
1. Vaswani, A. et al. Attention is all you need. In Advances in neural information processing systems Vol. 30 (Curran Associates, Inc., 2017).
2. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, 4171–4186 (2019).
3. OpenAI, R. Gpt-4 technical report. arXiv 2303–08774 (2023).
4. Sun, C., Qiu, X., Xu, Y. & Huang, X. How to fine-tune bert for text classification? In Chinese Computational Linguistics: 18th China National Conference, CCL 2019, Kunming, China, October 18–20, 2019, Proceedings 18, 194–206 (Springer, 2019).
5. Chang, W.-C., Yu, H.-F., Zhong, K., Yang, Y. & Dhillon, I. S. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 3163–3171 (2020).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献