1. Uday Bondhugula. 2020. High Performance Code Generation in MLIR: An Early Case Study with GEMM. arxiv:2003.00532. Uday Bondhugula. 2020. High Performance Code Generation in MLIR: An Early Case Study with GEMM. arxiv:2003.00532.
2. A practical automatic polyhedral parallelizer and locality optimizer
3. TSM2
4. Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Meghan Cowan , Haichen Shen , Leyuan Wang , Yuwei Hu , Luis Ceze , Carlos Guestrin , and Arvind Krishnamurthy . 2018 . TVM: An Automated End-to-End Optimizing Compiler for Deep Learning . In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018 ). USENIX Association, Berkley, CA, USA. 579–594. https://doi.org/10.48550/arXiv. 1802.04799 10.48550/arXiv.1802.04799 Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018). USENIX Association, Berkley, CA, USA. 579–594. https://doi.org/10.48550/arXiv.1802.04799
5. Tile size selection using cache organization and data layout