1. Andreolli, C., Thierry, P., Borges, L., Skinner, G., Yount, C.: Characterization and optimization methodology applied to stencil computations. In: Reinders, J., Jeffers, J. (eds.) High Performance Parallelism Pearls. Morgan Kaufmann, Boston (2015)
2. Carrijo Nasciutti, T., Panetta, J., Pais Lopes, P.: Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs. Concurr. Comput.: Pract. Exp. 31, e4929 (2018)
3. Corbet, J.: Toward better NUMA scheduling (2012).
http://lwn.net/Articles/486858/
4. Cruz, E.H., Diener, M., Alves, M.A., Pilla, L.L., Navaux, P.O.: LAPT: a locality-aware page table for thread and data mapping. Parallel Comput. 54, 59–71 (2016)
5. Cruz, E.H., Diener, M., Serpa, M.S., Navaux, P.O.A., Pilla, L., Koren, I.: Improving communication and load balancing with thread mapping in manycore systems. In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 93–100. IEEE (2018)