1. High-Performance Low-Memory Lowering: GEMM-based Algorithms for DNN Convolution
2. Reformulating the direct convolution for high-performance deep learning inference on ARM processors
3. Compiling for the IBM Matrix Engine for Enterprise Workloads
4. Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High performance convolutional neural networks for document processing. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition, Guy Lorette (Ed.). Université de Rennes 1, Suvisoft, La Baule (France). Retrieved from https://hal.inria.fr/inria-00112631
5. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI’18). USENIX Association, 579–594.