1. Baevski, A., Zhou, H., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neural. Inf. Process. Syst. 33, 2541–2551 (2020)
2. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3862–3872 (2017)
3. Chiu, C.-C., et al.: State-of-the-art speech recognition with sequence-to-sequence models. IEEE Trans. Learn. Technol. 5(3), 1206–1214
4. Zaheer, M.: Big bird: transformers for longer sequences. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 505–516 (2020)
5. Lewis, M., et al.: Facebook AI ‘BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension’, pp. 7871–7880. Association for Computational Linguistics (ACL) (2019)