1. Flamingo: a visual language model for few-shot learning;Alayrac;Advances in Neural Information Processing Systems,2022
2. Segment anything in 3d with nerfs;Cen;CoRR,2023
3. Localizing Visual Sounds the Hard Way
4. Vggsound: A Large-Scale Audio-Visual Dataset
5. Segment anything model (SAM) enhanced pseudo labels for weakly supervised semantic segmentation;Chen;CoRR,2023