1. Bommasani, R., et al.: On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021)
2. Radford, A., et al.: Learning transferable visual models from natural language supervision, In: International Conference on Machine Learning, pp. 8748–8763 (2021)
3. Yuan, L., et al.: Florence: a new foundation model for computer vision. arXiv preprint arXiv:2111.11432 (2021)
4. Pham, H., et al.: Combined scaling for open-vocabulary image classification. arXiv e-prints (2021)
5. Pourpanah, F., et al.: A review of generalized zero-shot learning methods. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)