1. All you need is a good init;Mishkin,2015
2. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks;Saxe,2013
3. Learning long-term dependencies with gradient descent is difficult
4. On the difficulty of training recurrent neural networks;Pascanu,2013
5. A simple way to initialize recurrent networks of rectified linear units;Le,2015