1. Intel 64 and IA-32 Architectures Optimization Reference Manual. Intel Press, 2016 June 2016. http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
2. Afzal, A., Hager, G., Wellein, G.: Desynchronization and wave pattern formation in MPI-parallel and hybrid memory-bound programs (2020). https://arxiv.org/abs/2002.02989 . Accepted for ISC High Performance 2020
3. Alappat, C.L., et al.: A recursive algebraic coloring technique for hardware-efficient symmetric sparse matrix-vector multiplication (2020). Accepted for publication in ACM Transactions on Parallel Computing. https://doi.org/10.1145/3399732
4. ARM: ARM Cortex-A75 Core Technical Reference Manual - Write streaming mode. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100403_0200_00_en/lto1473834732563.html . Accessed 26 Mar 2020
5. Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011). http://doi.acm.org/10.1145/2049662.2049663