1. Anatomy of high-performance matrix multiplication;Goto;ACM Transactions on Mathematical Software,2008
2. Achieving high sustained performance in an unstructured mesh CFD application;Anderson,1999
3. J.M. Dennis, Automated memory analysis: improving the design and implementation of iterative algorithms, Ph.D. thesis, University of Colorado, Boulder, CO, 2005.
4. G.W. Howell, J.W. Demmel, C.T. Fulton, S. Hammarling, K. Marmol, Cache efficient bidiagonalization using BLAS 2.5 operators, ACM Transactions on Mathematical Software (TOMS) (3), 2008, 34
5. Avoiding communication in sparse matrix computations;Demmel,2008