1. Large scale distributed deep networks;Dean;Advances in Neural Information Processing Systems,2012
2. Parallelized stochastic gradient descent;Zinkevich;Advances in Neural Information Processing Systems 23,2010
3. Communication-computation efficient gradient coding;Ye
4. Slow and stale gradients can win the race: Error-runtime trade-offs in distributed SGD;Dutta
5. Distributed Gradient Descent with Coded Partial Gradient Computations