1. Robust Speech Recognition via Large-scale Weak Supervision;Radford;arXiv preprint arXiv:2212.04356,2022
2. LLaMA: Open and Efficient Foundation Language Models;Touvron;arXiv preprint arXiv:2302.13971,2023
3. Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
4. Language Models are Few-shot Learners;Brown;NeurIPS,2020
5. Listen, Think, and Understand;Gong;arXiv preprint arXiv:2305.10790,2023