1. Timothy G. Armstrong, Justin M. Wozniak, Michael Wilde, and Ian T. Foster. 2014. Compiler Techniques for Massively Scalable Implicit Task Parallelism. In Proceedings of the 26th International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’14). IEEE, New Orleans, LA, USA, 299– 310.
2. Li-Wen Chang, Izzat El Hajj, Christopher Rodrigues, Juan Gómez-Luna, and Wen-mei Hwu. 2016. Efficient Kernel Synthesis for Performance Portable Programming. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’16). IEEE, Taipei, Taiwan, 12:1–12:13.
3. Huimin Cui, Lei Wang, Jingling Xue, Yang Yang, and Xiaobing Feng. 2011. Automatic Library Generation for BLAS3 on GPUs. In 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011, Anchorage, Alaska, USA, 16-20 May, 2011 - Conference Proceedings. IEEE, 255–265. CC ’19, February 16–17, 2019, Washington, DC, USA Y. Liu, L. Huang, M. Wu, H. Cui, F. Lv, X. Feng, and J. Xue
4. Huimin Cui, Jingling Xue, Lei Wang, Yang Yang, Xiaobing Feng, and Dongrui Fan. 2012. Extendable pattern-oriented optimization directives. ACM Transactions on Architecture and Code Optimization 9, 3 (2012), 14.
5. Huimin Cui, Qing Yi, Jingling Xue, and Xiaobing Feng. 2013. Layout-Oblivious Compiler Optimization for Matrix Computations. Acm Transactions on Architecture and Code Optimization 9, 4 (2013), 1–20.