1. End-to-end attention-based large vocabulary speech recognition;Bahdanau,2016
2. Y. Bengio, N. Léonard, A. Courville, Estimating or propagating gradients through stochastic neurons for conditional computation, arXiv:1308.3432 (2013).
3. Model compression;Bucilu,2006
4. W. Chen, J.T. Wilson, S. Tyree, K.Q. Weinberger, Y. Chen, Compressing neural networks with the hashing trick, CoRR, arXiv:1504.04788 (2015).
5. M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks: training deep neural networks with weights and activations constrained to+ 1 or-1, 2016,