1. Graves, A., & Gomez, F. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In International Conference on Machine learning (ICML) (pp. 369–376).
2. Chorowski, J., Bahdanau, D., Cho, K., & Bengio, Y. (2014). End-to-end continuous speech recognition using attention-based recurrent NN: First results. arXiv preprint arXiv, v1412, 1602.
3. Graves, A., & Jaitly, N. (2014). Towards end-to-end speech recognition with recurrent neural networks. In International Conference on Machine Learning (ICML) (pp. 1764–1772).
4. D. Amodei, R. Anubhai, E. Battenberg, et al, “Deep speech 2: End-to-end speech recognition in English and mandarin,” Computer Science, 2015.
5. Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: An ASR corpus based on public domain audio books. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).