1. [1] A. V. Adinetz. CUDA Pro Tip: Optimized Filtering With Warp-Aggregated Atomics, PARALLEL FORALL. http://devblogs.nvidia.com/parallelforall/ cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/, 2015.
2. [2] M. Billeter, O. Olsson, and U. Assarsson. Ecient Stream Compaction on Wide SIMD Many- Core Architectures. In Proceedings of the Conference on High Performance Graphics, 2009.
3. [3] G. Diamos, H.Wu, A. Lele, J.Wang, and S. Yalamanchili. Ecient relational algebra algorithms and data structures for GPU. Technical report, Technical Report GIT-CERCS-12-01, CERCS, Georgia Institute of Technology, 2012.
4. [4] K. Garanzha, S. Premoze, A. Bely, and V. Galaktionov. Grid-based SAH BVH construction on a GPU. The Visual Computer, 27(6-8):697{706, 2011.
5. [5] M. Harris and M. Garland. Optimizing Parallel Prex Operations for the Fermi Architecture. Chapter 3 of the book GPU Computing Gems - Jade Edition, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011.