1. Ruslan Arutyunyan. 2022. C++17 Parallel Algorithms and P2300. Technical Report. http://wg21.link/p2500 Ruslan Arutyunyan. 2022. C++17 Parallel Algorithms and P2300. Technical Report. http://wg21.link/p2500
2. Ben Ashbaugh , James C Brodman , Michael Kinsner , Gregory Lueck , John Pennycook , and Roland Schulz . 2021 . Toward a Better Defined SYCL Memory Consistency Model . In International Workshop on OpenCL(IWOCL’21) . Association for Computing Machinery, New York, NY, USA, Article 20, 3 pages. Ben Ashbaugh, James C Brodman, Michael Kinsner, Gregory Lueck, John Pennycook, and Roland Schulz. 2021. Toward a Better Defined SYCL Memory Consistency Model. In International Workshop on OpenCL(IWOCL’21). Association for Computing Machinery, New York, NY, USA, Article 20, 3 pages.
3. Michael Bauer , Henry Cook , and Brucek Khailany . 2011 . CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization . In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis ( Seattle, Washington) (SC ’11). Association for Computing Machinery, New York, NY, USA, Article 12, 11 pages. Michael Bauer, Henry Cook, and Brucek Khailany. 2011. CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (Seattle, Washington) (SC ’11). Association for Computing Machinery, New York, NY, USA, Article 12, 11 pages.
4. Singe;Bauer Michael;Leveraging Warp Specialization for High Performance on GPUs. SIGPLAN Not.,2014
5. David A Beckingsale , Jason Burmark , Rich Hornung , Holger Jones , William Killian , Adam J Kunen , Olga Pearce , Peter Robinson , Brian S Ryujin , and Thomas RW Scogland . 2019 . RAJA: Portable Performance for Large-scale Scientific Applications. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 71–81 . David A Beckingsale, Jason Burmark, Rich Hornung, Holger Jones, William Killian, Adam J Kunen, Olga Pearce, Peter Robinson, Brian S Ryujin, and Thomas RW Scogland. 2019. RAJA: Portable Performance for Large-scale Scientific Applications. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 71–81.