1. NVIDIA A100 Tensor Core GPU Architecture: Unprecedented Acceleration At Every Scale (2020). https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf
2. The x86 Advanced Matrix Extension (AMX) Brings Matrix Operations; To Debut with Sapphire Rapids (2020). https://fuse.wikichip.org/news/3600/the-x86-advanced-matrix-extension-amx-brings-matrix-operations-to-debut-with-sapphire-rapids/
3. Dongarra, J., Hammarling, S., Higham, N., Relton, S., Valero-Lara, P., Zounon, M.: The design and performance of batched BLAS on modern high-performance computing systems. Proc. Comput. Sci. 108, 495–504 (2017). https://doi.org/10.1016/j.procs.2017.05.138
4. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org
5. Gustafson, J.L.: Amdahl’s Law. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, vol. xx, pp. 53–60. Springer, US, Boston, MA (2011). https://doi.org/10.1007/978-07-09766-4_77