Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training-Reference-Cited by-同舟云学术

Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training

Author:

Moon Jong Hak¹^ORCID,Lee Hyungyung¹^ORCID,Shin Woncheol¹,Kim Young-Hak²^ORCID,Choi Edward¹^ORCID

Affiliation:

1. Graduate School of AI, KAIST, Daejeon, South Korea

2. Department of Cardiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea

Funder

Samsung

Institute of Information and Communications Technology Planning and Evaluation

Artificial Intelligence Graduate School Program

National Research Foundation of Korea

Korea government

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Subject

Health Information Management,Electrical and Electronic Engineering,Computer Science Applications,Health Informatics

Link

Reference41 articles.

2. Google's neural machine translation system: Bridging the gap between human and machine translation;wu,2016

4. Learning video representations using contrastive bidirectional transformer;sun,2019

Cited by 53 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

5. RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24