1. Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998-6008. http://arxiv.org/abs/1706.03762
2. Firth, J. R. (1957). A Synopsis of Linguistic Theory, 1930-1955.
3. Luscher, C., et al. (2019) RWTH ASR Systems for LibriSpeech: Hybrid vs Attention-w/o Data Augmentation, 1-5. http://arxiv.org/abs/1905.03072
4. Devlin, J., et al. (2019). Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/abs/1810.04805
5. Xu, Q., et al. (2020). Iterative Pseudo-Labeling for Speech Recognition, 1-13. https://arxiv.org/abs/2005.09267