1. Abadi M, Barham P, Chen J (2016) Tensorflow: A system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016, USENIX Association, pp 265–283
2. Awan AA, Subramoni H, Panda DK (2017) An in-depth performance characterization of CPU- and gpu-based DNN training on modern architectures. In: Proceedings of the machine learning on HPC environments, MLHPC@SC 2017, Denver, CO, USA, November 13, 2017, ACM, pp 8:1–8:8
3. Chetlur S, Woolley C, Vandermersch P (2014) cudnn: Efficient primitives for deep learning. CoRR abs/1410.0759, arXiv:1410.0759
4. Chilimbi TM, Suzue Y, Apacible J (2014) Project adam: Building an efficient and scalable deep learning training system. In: 11th USENIX symposium on operating systems design and implementation, OSDI ’14, Broomfield, CO, USA, October 6-8, 2014, USENIX Association, pp 571–582
5. Dean J, Corrado G, Monga R, (2012) Large scale distributed deep networks. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems, (2012) Proceedings of a meeting held December 3–6, 2012. Lake Tahoe, Nevada, United States, pp 1232–1240