1. AST: Audio Spectrogram Transformer
2. PSLA: Improving audio event classification with pretraining, sampling, labeling, and aggregation;gong,2021
3. Multimodal self-supervised learning of general audio representations;wang,2021
4. Self-supervised multimodal versatile networks;alayrac;NeurIPS,2020
5. Contrastive learning of musical representations;spijkervet,2021