1. Abraham, S.G., Hudak, D.E.: Compile-time partitioning of iterative parallel loops to reduce cache coherency traffic. IEEE Trans. Parallel Distrib. Syst. 2(3), 318–328 (1991)
2. Datta, K.: Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. Ph.D. thesis, University of California, Berkeley (2009)
3. DeFlumere, A.: Optimal partitioning for parallel matrix computation on a small number of abstract heterogeneous processors. Ph.D. thesis, University College Dublin (2014)
4. Dursun, H., Nomura, K.I., Wang, W., Kunaseth, M., Peng, L., Seymour, R., Kalia, R.K., Nakano, A., Vashishta, P.: In-core optimization of high-order stencil computations. In: PDPTA, pp. 533–538 (2009)
5. Hagen, W., Plauth, M., Eberhardt, F., Polze, A.: PGASUS: a framework for C++ application development on NUMA architectures. In: 2016 Fourth International Symposium on Computing and Networking (CANDAR), pp. 368–374. IEEE, Hiroshima, November 2016