1. Scaling ASR Improves Zero and Few Shot Learning;xiao;Proc Interspeech 2022,2022
2. Training deep nets with sublinear memory cost;chen;arXiv preprint arXiv 1604 06174,2016
3. Improved Training of End-to-end Attention Models for Speech Recognition
4. Fairscale: A general purpose modular pytorch library for high performance and large scale training;authors,2021
5. Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model