1. The effects of adding noise during backpropagation training on a generalization performance;An;Neural Computation,1996
2. The shattered gradients problem: If resnets are the answer, then what is the question?;Balduzzi,2017
3. Dual path networks;Chen,2017
4. Big batch SGD: Automated inference using adaptive batch sizes;De,2016
5. Adaptive subgradient methods for online learning and stochastic optimization;Duchi;Journal of Machine Learning Research,2011