1. Dl: A data layout transformation system for heterogeneous computing;Sung,2012
2. The gem5 simulator;Binkert;ACM SIGARCH Comput Archit News,2011
3. Analyzing CUDA workloads using a detailed GPU simulator;Bakhoda,2009
4. CUDA C Programming Guide;Nvidia,2017
5. Fermi GF100 GPU architecture;Wittenbrink;IEEE Micro,2011