1. Packing: Towards 2x NLP BERT acceleration;kosec,2021
2. BERT: Pre-training of deep bidirectional transformers for language understanding;devlin;Proc Conf North Amer Chapter Assoc Comput Linguistics Hum Lang Technol,2019
3. OpenBLAS: An optimized BLAS library,2021
4. DietCode: Automatic optimization for dynamic tensor programs;zheng;Proc Mach Learn Syst,2022
5. Chameleon: Adaptive code optimization for expedited deep neural network compilation;ahn;Proc 8th Int Conf Learn Representations,2020