Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation

Author:

Li Cheng-Yi1ORCID,Chang Kao-Jung2,Yang Cheng-Fu3,Wu Hsin-Yu4,Chen Wenting5,Bansal Hritik3,Chen Ling6,Yang Yi-Ping7,Chen Yu-Chun8,Chen Shih-Pin9,Lirng Jiing-Feng10,Chang Kai-Wei3,Chiou Shih-Hwa11

Affiliation:

1. Department of Computer Science, University of California, Los Angeles; School of Medicine, College of Medicine, National Yang Ming Chiao Tung University; Department of Medical Research, Taipei Veterans General Hospital

2. Department of Medical Research, Taipei Veterans General Hospital; Institute of Clinical Medicine, National Yang Ming Chiao Tung University; Big Data Center, Taipei Veterans General Hospital; Department of Ophthalmology, Taipei Veterans General Hospital

3. Department of Computer Science, University of California, Los Angeles

4. School of Medicine, College of Medicine, National Yang Ming Chiao Tung University; Department of Medical Research, Taipei Veterans General Hospital

5. Department of Electrical Engineering, City University of Hong Kong

6. Institute of Hospital and Health Care Administration, National Yang Ming Chiao Tung University

7. Department of Medical Research, Taipei Veterans General Hospital

8. School of Medicine, College of Medicine, National Yang Ming Chiao Tung University; Big Data Center, Taipei Veterans General Hospital; Institute of Hospital and Health Care Administration, National Yang Ming Chiao Tung University; Department of Family Medicine, Taipei Veterans General Hospital

9. Institute of Clinical Medicine, National Yang Ming Chiao Tung University; Department of Neurology, Neurological Institute, Taipei Veterans General Hospital

10. Department of Radiology, School of Medicine, National Yang Ming Chiao Tung University; College of Medicine, National Yang Ming Chiao Tung University

11. Department of Medical Research, Taipei Veterans General Hospital; Institute of Clinical Medicine, National Yang Ming Chiao Tung University; Department of Ophthalmology, Taipei Veterans General Hospital

Abstract

Abstract

Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary MLLM successful attempts in 2D medical image-text pair captioning are incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. Toward deploying MLLM for more applicable diagnostic context, we noticed that the (1) scarcity of 3D image training dataset, (2) the direct use of undifferentiated foundation MLLMs, and (3) the lack of pertinent caption evaluation metrics were independent domain-specific constraints that integratively hobbles the iteration of next-generation medical MLLM research. In this regard, this study collected a 3D-BrainCT dataset (18,885 text-scan pairs) and applied clinical visual instruction tuning (CVIT) to train volumetric anatomy-sensible BrainGPT models to generate radiology-adherent 3D brain CT reports. Statistically, our BrainGPT model scored BLEU-1 = 44.35, BLEU-4 = 20.38, METEOR = 30.13, ROUGE-L = 47.6, and CIDEr-R = 211.77 during internal testing and demonstrated an accuracy of 0.91 in captioning midline shifts on the external validation CQ500 dataset. By further inspecting the captioned report, we reported that the traditional metrics appeared to measure only the surface text similarity and failed to gauge the information density of the diagnostic purpose. To close this gap, we proposed a novel Feature-Oriented Radiology Task Evaluation (FORTE) to estimate the clinical relevance (lesion feature and landmarks) of the report. Notably, the BrainGPT model scored an average FORTE 0.71 F1-score (degree=0.661; landmark=0.706; feature=0.693, and impression=0.779). To demonstrate that BrainGPT models possess objective readiness to generate human-like radiology reports, we conducted a Turing test that enrolled 11 physician evaluators, and around 74% of the BrainGPT-generated captions were indistinguishable from those written by humans. While various computational intelligence researchers have advocated the avant-garde MLLM applications, our work embodies a holistic framework that showcased the first-hand experience of curating a 3D brain CT dataset, fine-tuning anatomy-sensible language models, and proposing robust radiology evaluation metrics. We deemed that the adventure of docking MLLM for 3D brain CT report generation may unfold new MLLM applications at the forefront of human-machine collaborated modern healthcare.

Publisher

Springer Science and Business Media LLC

Reference41 articles.

1. Large-scale pancreatic cancer detection via non-contrast CT and deep learning;Cao K;Nat Med,2023

2. Deep learning-aided decision support for diagnosis of skin disease across skin tones;Groh M;Nat Med,2024

3. Prediction of tumor origin in cancers of unknown primary origin with cytology-based deep learning;Tian F;Nat Med,2024

4. A deep learning system for predicting time to progression of diabetic retinopathy;Dai L;Nat Med,2024

5. The Current and Future State of AI Interpretation of Medical Images;Rajpurkar P;N Engl J Med,2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3