1. Baevski A, Hsu W, Xu Q, Babu A, Gu J, Auli M (2022) data2vec: A general framework for self-supervised learning in speech, vision and language. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 1298–1312. PMLR, ???. https://proceedings.mlr.press/v162/baevski22a.html
2. Bahng H, Jahanian A, Sankaranarayanan S, Isola P (2022) Exploring Visual Prompts for Adapting Large-Scale Models
3. Bao H, Dong L, Piao S, Wei F (2022) Beit: BERT pre-training of image transformers. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, ???. https://openreview.net/forum?id=p-BhZSz59o4
4. Bossard L, Guillaumin M, Gool LV (2014) Food-101 - mining discriminative components with random forests. In: Fleet, D.J., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI. Lecture Notes in Computer Science, vol. 8694, pp. 446–461. Springer, ???. https://doi.org/10.1007/978-3-319-10599-4_29
5. Bulat A, Tzimiropoulos G (2023) LASP: text-to-text optimization for language-aware soft prompting of vision & language models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pp. 23232–23241. IEEE, ??? . https://doi.org/10.1109/CVPR52729.2023.02225