1. Amdahl, G.M.: Validity of the single-processor approach to achieving large scale computing capabilities. In: AFIPS Conference Proceedings, April 18-20, vol. 30, pp. 483–485. AFIPS Press, Reston (1967)
2. Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. Soc. for Ind. and Appl. Math., 3rd edn., PA (1999)
3. Bell, N., Garland, M.: The impact of cache misses on the performance of matrix product algorithms on multicore platforms. Research Report NVR-2008-004 (December 2008),
http://hal.inria.fr/inria-00537822/en/
4. Blackford, L.S., et al.: An updated set of basic linear algebra subprograms (blas). ACM Trans. Math. Softw. 28(2), 135–151 (2002)
5. Lecture Notes in Computer Science;D. Clarke,2012