1. Lecture 6.5-RMSPROP: Divide the gradient by a running average of its recent magnitude;tieleman;COURSERA Neural Netw Mach Learn,2012
2. Deep sparse rectified neural networks;glorot;Proc AISTATS,2011
3. Squeeze-and-Excitation Networks
4. Understanding the difficulty of training deep feedforward neural network;glorot;Proc AISTSTS,2010