1. [1] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional Architecture for Fast Feature Embedding,” Proc. 22nd ACM International Conference on Multimedia, New York, New York, USA, pp.675-678, ACM, 2014. 10.1145/2647868.2654889
2. [2] H. Cui, H. Zhang, G.R. Ganger, P.B. Gibbons, and E.P. Xing, “GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server,” Proc. 11th European Conference on Computer Systems, pp.1-16, ACM Press, 2016. 10.1145/2901318.2901323
3. [3] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg,R. Monga, S. Moore, D.G. Murray, B. Steiner, P. Tucker, V.Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng,“TensorFlow: A System for Large-Scale Machine Learning,” Proc. 12th USENIX Conf. on Operating Systems Design and Implementation, pp.265-283, USENIX, 2016.
4. [4] NVIDIA, “GPU-Based Deep Learning Inference: A Performance and Power Analysis,” 2015. http://developer.download.nvidia.com/embedded/jetson/TX1/docs/jetson_tx1_whitepaper.pdf.
5. [5] N. Maruyama, T. Nomura, K. Sato, and S. Matsuoka, “Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers,” Proc. 2011 Int'l Conf. for High Performance Computing, Networking, Storage and Analysis, pp.11:1-11:12, ACM, 2011. 10.1145/2063384.2063398