1. Bala, K., Kaashoek, M.F., Weihl, W.E.: Software prefetching and caching for translation lookaside buffers. In: Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation, p. 18. USENIX Association (1994)
2. Beaumont, O., Boudet, V., Robert, Y., et al.: A realistic model and an efficient heuristic for scheduling with heterogeneous processors (2001)
3. Belviranli, M.E., Bhuyan, L.N., Gupta, R.: A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. ACM Trans. Archit. Code Optim. (TACO) 9(4), 57 (2013)
4. Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: Dague: a generic distributed dag engine for high performance computing. Parallel Comput. 38(1), 37–51 (2012)
5. Chen, J., Tao, X., Yang, Z., Peir, J.-K., Li, X., Lu, S.-L.: Guided region-based gpu scheduling: utilizing multi-thread parallelism to hide memory latency. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 441–451. IEEE (2013)