1. Agarwal A, Levy M (2007) The kill rule for multicore. In: DAC’07: proceedings of the 44th annual design automation conference, New York, NY, USA. ACM, New York, pp 750–753
2. Agullo E, Demmel J, Dongarra J, Hadri B, Kurzak J, Langou J, Ltaief H, Luszczek P, Tomov S (2009) Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J Phys Conf Ser 180(1):012037
3. Ament M, Knittel G, Weiskopf D, Strasser W (2010) A parallel preconditioned conjugate gradient solver for the Poisson problem on a Multi-GPU platform. In: PDP’10: proceedings of the 2010 18th euromicro conference on parallel, distributed and network-based processing, Washington, DC, USA. IEEE Comput. Soc., Los Alamitos, pp 583–592
4. Baskaran MM, Bordawekar R (2008) Optimizing sparse matrix-vector multiplication on GPUs. Tech report, IBM Research
5. Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC’09: proceedings of the conference on high performance computing networking, storage and analysis, New York, NY, USA. ACM, New York, pp 1–11