1. Wav2Vec 2.0: A framework for self-supervised learning of speech representations;baevski;Proc Adv Neural Inf Process Syst,2020
2. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter;sanh;arXiv 1910 01108,2019
3. Masked autoencoders are scalable vision learners;he;arXiv 2111 06377,2021
4. Distilling the knowledge in a neural network;hinton;ArXiv 1503 02531,2015
5. On the opportunities and risks of foundation models;bommasani;arXiv 2108 07258,2108