1. J. Appleyard and S. Yokim. 2017. Programming Tensor Cores in CUDA 9. https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/ J. Appleyard and S. Yokim. 2017. Programming Tensor Cores in CUDA 9. https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/
2. F. Ferrandi V. G. Castellana S. Curzel P. Fezzardi M. Fiorito M. Lattuada M. Minutoli C. Pilato and A. Tumeo. 2021. Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications. In DAC. ACM 1327--1330. F. Ferrandi V. G. Castellana S. Curzel P. Fezzardi M. Fiorito M. Lattuada M. Minutoli C. Pilato and A. Tumeo. 2021. Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications. In DAC. ACM 1327--1330.
3. N.P. Jouppi C. Young N. Patil D. Patterson G. Agrawal R. Bajwa S. Bates S. Bhatia N. Boden A. Borchers etal 2017. In-datacenter performance analysis of a tensor processing unit. In ISCA. ACM 1--12. N.P. Jouppi C. Young N. Patil D. Patterson G. Agrawal R. Bajwa S. Bates S. Bhatia N. Boden A. Borchers et al. 2017. In-datacenter performance analysis of a tensor processing unit. In ISCA . ACM 1--12.
4. C. Lattner , M. Amini , U. Bondhugula , A. Cohen , A. Davis , J. Pienaar , R. Riddle , T. Shpeisman , N. Vasilache , and O. Zinenko . 2021 . MLIR: Scaling compiler infrastructure for domain specific computation. In CGO. ACM, 2--14. C. Lattner, M. Amini, U. Bondhugula, A. Cohen, A. Davis, J. Pienaar, R. Riddle, T. Shpeisman, N. Vasilache, and O. Zinenko. 2021. MLIR: Scaling compiler infrastructure for domain specific computation. In CGO. ACM, 2--14.
5. Pouchet L. and others. 2021. PolyBench/C 4.2.1. https://web.cse.ohio-state.edu/~pouchet.2/software/polybench/ Pouchet L. and others. 2021. PolyBench/C 4.2.1. https://web.cse.ohio-state.edu/~pouchet.2/software/polybench/