1. Boecking, B., Usuyama, N., Bannur, S., Castro, D.C., Schwaighofer, A., Hyland, S., Wetscherek, M., Naumann, T., Nori, A., Alvarez-Valle, J., Poon, H., Oktay, O., 2022. Making the most of text semantics to improve biomedical vision-language processing. In: Proceedings of European Conference on Computer Vision. ECCV, pp. 1–21.
2. Self-supervised learning for medical image analysis using image context restoration;Chen;Med. Image Anslysis,2019
3. Chen, X., Changpinyo, S., Piergiovanni, A., Padlewski, P., Salz, D., Goodman, S., Grycner, A., Mustafa, B., Beyer, L., Kolesnikov, A., Puigcerver, J., Ding, N., Rong, K., Akbari, H., Mishra, G., Xue, L., Thapliyal, A., Bradbury, J., Kuo, W., Seyedhosseini, M., Jia, C., Ayan, B.K., Riquelme, C., Steiner, A., Angelova, A., Zhai, X., Houlsby, N., Soricut, R., 2023. PaLI: A jointly-scaled multilingual language-image model. In: Proceedings of International Conference on Learning Representations. ICLR.
4. Chen, Z., Du, Y., Hu, J., Liu, Y., Li, G., Wan, X., Chang, T., 2022. Multi-modal masked autoencoders for medical vision-and-language pre-training. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention. MICCAI, Vol. 13435, pp. 679–689.
5. Devlin, J., Chang, M., Lee, K., Toutanova, K., 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT. pp. 4171–4186.