1. STRATTON J A, ANSSARI N, RODRIGUES C, I J SUNG, OBEID N, CHANG L W, LIU G D, HWU W. Optimization and architecture effects on GPU computing workload performance [C]//Innovative Parallel Computing (InPar). San Jose, USA: IEEE, 2012: 1–10.
2. RYOO S, RODRIGUES C I, BAGHSORKHI S S, STONE S S, KIRK D B, HWU W W. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA [C]//Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '08). Utah, USA: ACM, 2008: 73–82.
3. JANG B, DO S, PIEN H, KAELI D. Architecture-aware optimization targeting multithreaded stream computing [C]//Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units-GPGPU-2. Washington DC, USA: ACM, 2009: 62–70.
4. JANG B, SCHAA D, MISTRY P, KAELI D. Exploiting memory access patterns to improve memory performance in data-parallel architectures [J]. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(1): 105–118.
5. MEI X, ZHAO K, LIU C, CHU X. Benchmarking the memory hierarchy of modern GPUs [M]. Heidelberg: Springer Berlin, 2014: 144–156.