1. Abdelfattah, R., Wang, X, Wang, S.: TTPLA: An Aerial-Image Dataset for Detection and Segmentation of Transmission Towers and Power Lines. In: Proceedings of the Asian Conference on Computer Vision (2020)
2. Radford, A., et al.: Learning Transferable Visual Models From Natural Language Supervision. In: Proceedings of the International Conference on ML, pp. 8748–8763 (2021)
3. Wang, Z., Wu, Z., Agarwal, D., Sun, J.: MedCLIP: Contrastive learning from unpaired medical images and text. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 3876–3887 (2022)
4. Deng, Y., Campbell, R., Kumar, P.: Fire and Gun Detection Based on Sematic Embeddings. In: IEEE International Conference on Multimedia (2022)
5. Endo, M., Krishnan, R., Krishna, V., Ng, A.Y., Rajpurkar, P.: Retrieval-based chest x-ray report generation using a pre-trained contrastive language-image model. In: Proceedings of ML Research, vol. 158. PMLR, pp. 209–219, Nov. 28 (2021)