Multimodal Large Language Models are Generalist Medical Image Interpreters-Reference-Cited by-同舟云学术

Multimodal Large Language Models are Generalist Medical Image Interpreters

Published:2023-12-22 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Han Tianyu,Adams Lisa C.,Nebelung Sven^ORCID,Kather Jakob Nikolas^ORCID,Bressem Keno K.,Truhn Daniel

Abstract

AbstractMedicine is undergoing a transformation with the integration of Artificial Intelligence (AI). Traditional AI models, though clinically useful and often matching or surpassing expert clinicians in specific tasks, face a scalability challenge due to the necessity of developing individual models for each task. Therefore, there is a push towards foundation models that are applicable to a wider set of tasks. Our study showcases how non-domain-specific, publicly available vision-language models can be employed as general foundation models for medical applications. We test our paradigm across four medical disciplines - pathology, dermatology, ophthalmology, and radiology - focusing on two use-cases within each discipline. We find that our approach beats existing pre-training methods and is competitive to domain-specific foundation models that require vast amounts of domain-specific training images. We also find that large vision-language models are data efficient and do not require large annotated datasets to reach competitive performance. This allows for the development of new or improved AI models in areas of medicine where data is scarce and will accelerate medical progress towards true multimodal foundation models.

Publisher

Cold Spring Harbor Laboratory

Reference48 articles.

1. Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study

2. Use of GPT-4 to Diagnose Complex Clinical Cases

3. Comparative Analysis of GPT-4Vision, GPT-4 and Open Source LLMs in Clinical Diagnostic Accuracy: A Benchmark Against Human Expertise

4. Image prediction of disease progression for osteoarthritis by style-based manifold extrapolation. Nat;Mach. Intell,2022

5. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography