1. Abdelfattah, A., Anzt, H., Boman, E.G., Carson, E., Cojean, T., Dongarra, J., Gates, M., Grützmacher, T., Higham, N.J., Li, S., Lindquist, N., Liu, Y., Loe, J., Luszczek, P., Nayak, P., Pranesh, S., Rajamanickam, S., Ribizel, T., Smith, B., Swirydowicz, K., Thomas, S., Tomov, S., Tsai, Y.M., Yamazaki, I., Yang, U.M.: A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic. arXiv:2007.06674 [cs, math] (2020)
2. Adámek, K., Dimoudi, S., Giles, M., Armour, W.: GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory. arXiv:1910.01972 [cs] (2020)
3. Anderson, A., Vasudevan, A., Keane, C., Gregg, D.: Low-memory GEMM-based convolution algorithms for deep neural networks. arXiv:1709.03395 [cs] (2017)
4. Barabasz, B., Anderson, A., Gregg, D.: Improving The Accuracy of Winograd Convolution for Deep Neural Networks p. 18 (2018)
5. Barrachina, S., Dolz, M.F., San Juan, P., Quintana-Ortí, E.S.: Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors. J. Parallel Distrib. Comput. 167, 240–254 (2022). https://doi.org/10.1016/j.jpdc.2022.05.009