1. [1] T. Mikolov, A. Deoras, S. Kombrink, L. Burget, and J. Cernocky, “Empirical evaluation and combination of advanced language modeling techniques,” INTERSPEECH, pp.605-608, 2011.
2. [2] G.E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition,” IEEE Trans. Audio Speech Lang. Process., vol.20, no.1, pp.30-42, 2012.
3. [3] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, pp.1097-1105, 2012.
4. [4] P.E. Utgoff and D.J. Stracuzzi, “Many-layered learning,” Neural Comput., vol.14, no.10, pp.2497-2529, 2002.
5. [5] G.F. Montufar, R. Pascanu, K. Cho, and Y. Bengio, “On the number of linear regions of deep neural networks,” Advances in Neural Information Processing Systems, pp.2924-2932, 2014.