1. Nccl 2.0;jeaugey;Proc GPU Technol Conf,2017
2. Horovod: Fast and easy distributed deep learning in tensorflow;sergeev,2018
3. A unified architecture for accelerating distributed {DNN} training in heterogeneous GPU/CPU clusters;jiang;Proc 14th USENIX Symp Operating Syst Des Implementation,2020
4. Scaling Distributed Machine Learning with the Parameter Server
5. Post: Device placement with cross-entropy minimization and proximal policy optimization;gao;Proc Adv Neural Inf Process Syst,2018