1. Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal, Matthew Marinella, Martin Foltin, John Paul Strachan, Dejan Milojicic, Wen-Mei Hwu, and Kaushik Roy. 2020. Panther: A programmable architecture for neural network training harnessing energy-efficient reram. IEEE Trans. Comput. (2020).
2. Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, and Jianfeng Gao. 2023. Model tells you what to discard: Adaptive kv cache compression for llms. arXiv preprint arXiv:2310.01801 (2023).
3. Transformer in transformer;Han Kai;Advances in Neural Information Processing Systems,2021
4. Mingxuan He, Choungki Song, Ilkon Kim, Chunseok Jeong, Seho Kim, Il Park, Mithuna Thottethodi, and TN Vijaykumar. 2020. Newton: A DRAM-maker’s accelerator-in-memory (AiM) architecture for machine learning. In IEEE/ACM MICRO.
5. [5] Intel. 2024. https://www.intel.com/content/www/us/en/developer/articles/technical/memory-performance-in-a-nutshell.html.