1. Applied machine learning at facebook: a datacenter infrastructure perspective;Hazelwood,2018
2. Y. You, Z. Zhang, C.-J. Hsieh, J. Demmel, K. Keutzer, 100-epoch imagenet training with alexnet in 24 minutes, ArXiv e-prints (2017).
3. A survey of techniques for architecting and managing GPU register file;Mittal;IEEE Trans. Parallel Distrib.Syst. (TPDS),2016
4. Scalpel: customizing DNN pruning to the underlying hardware parallelism;Yu,2017
5. A coordinated tiling and batching framework for efficient GEMM on GPUs;Li,2019