1. Design of a hybrid MPI-CUDA benchmark suite for CPU-GPU clusters;Agarwal,2014
2. Using the Intel MPI benchmarks (IMB) to evaluate MPI implementations on an Infiniband Nehalem Linux cluster;Bukhamsin,2010
3. Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters;Chorley;Journal of Computational Science,2010
4. Transformations to parallel codes for communication-computation overlap;Danalis,2005
5. A multilevel parallelization framework for high-order stencil computations;Dursun,2009