1. Accelerating reduction and scan using tensor core units
2. Toward accelerated stencil computation by adapting tensor core unit on GPU
3. Leonardo Mattes and Sergio Kofuji. 2010. Overcoming the GPU memory limitation on FDTD through the use of overlapping subgrids. In 2010 International Conference on Microwave and Millimeter Wave Technology. IEEE, 1536–1539.
4. Hiroko Midorikawa, Hideyuki Tan, and Toshio Endo. 2014. An evaluation of the potential of flash SSD as large and slow memory for stencil computations. In 2014 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 268–277.
5. Takeshi Minami, Motoharu Hibino, Tasuku Hiraishi, Takeshi Iwashita, and Hiroshi Nakashima. 2015. Automatic parameter tuning of three-dimensional tiled FDTD kernel. In High Performance Computing for Computational Science–VECPAR 2014: 11th International Conference, Eugene, OR, USA, June 30–July 3, 2014, Revised Selected Papers 11. Springer, 284–297.