1. Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, Min Y, Zhang B, Zhang J, Dong Z et al (2023) A survey of large language models. arXiv preprint arXiv:2303.18223
2. Oquab M, Darcet T, Moutakanni T, Vo HV, Szafraniec M et al (2023) DINOv2: learning robust visual features without supervision. arXiv:2304.07193
3. Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo W-Y et al (2023) Segment anything. In: ICCV
4. Zou X, Yang J, Zhang H, Li F, Li L, Wang J, Wang L, Gao J, Lee YJ (2023) Segment everything everywhere all at once. In: NeurIPS
5. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G et al (2021) Learning transferable visual models from natural language supervision. In: ICML