1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) Tensorflow: A system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), USENIX Association, Savannah, GA, pp 265–283, https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
2. AMD (2018) Hcblas library. https://gpuopen.com/compute-product/hcblas/
3. Chen J, Pan X, Monga R, Bengio S, Jozefowicz R (2016a) Revisiting distributed synchronous sgd. arXiv preprint arXiv:160400981
4. Chen T, Li M, Li Y, Lin M, Wang N, Wang M, Xiao T, Xu B, Zhang C, Zhang Z (2015) Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR arXiv:1512.01274
5. Chen T, Xu B, Zhang C, Guestrin C (2016b) Training deep nets with sublinear memory cost. CoRR arXiv:1604.06174