IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models-Reference-Cited by-同舟云学术

IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models

Published:2024-08-05 Issue:1 Volume:7 Page:
ISSN:2524-4442
Container-title:Visual Computing for Industry, Biomedicine, and Art
language:en
Short-container-title:Vis. Comput. Ind. Biomed. Art

Author:

Chen Zhihao,Hu Bin,Niu Chuang,Chen Tao,Li Yuxin,Shan Hongming^ORCID,Wang Ge

Abstract

AbstractLarge language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various tasks and attracted increasing interest as a natural language interface across many domains. Recently, large vision-language models (VLMs) that learn rich vision–language correlation from image–text pairs, like BLIP-2 and GPT-4, have been intensively investigated. However, despite these developments, the application of LLMs and VLMs in image quality assessment (IQA), particularly in medical imaging, remains unexplored. This is valuable for objective performance evaluation and potential supplement or even replacement of radiologists’ opinions. To this end, this study introduces IQAGPT, an innovative computed tomography (CT) IQA system that integrates image-quality captioning VLM with ChatGPT to generate quality scores and textual reports. First, a CT-IQA dataset comprising 1,000 CT slices with diverse quality levels is professionally annotated and compiled for training and evaluation. To better leverage the capabilities of LLMs, the annotated quality scores are converted into semantically rich text descriptions using a prompt template. Second, the image-quality captioning VLM is fine-tuned on the CT-IQA dataset to generate quality descriptions. The captioning model fuses image and text features through cross-modal attention. Third, based on the quality descriptions, users verbally request ChatGPT to rate image-quality scores or produce radiological quality reports. Results demonstrate the feasibility of assessing image quality using LLMs. The proposed IQAGPT outperformed GPT-4 and CLIP-IQA, as well as multitask classification and regression models that solely rely on images.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s42492-024-00171-w.pdf

Reference59 articles.

1. Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A et al (2024) PaLM: Scaling language modeling with pathways. J Mach Learn Res 24(1):240. https://doi.org/10.48550/arXiv.2204.02311

2. Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T et al (2023) LLaMA: Open and efficient foundation language models. arXiv preprint arXiv: 2302.13971. https://doi.org/10.48550/arXiv.2302.13971

3. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. https://openai.com/index/language-unsupervised/. Accessed 16 Oct 2023

4. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf. Accessed 16 Oct 2023

5. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P et al (2020) Language models are few-shot learners. In: Proceedings of the 34th international conference on neural information processing systems, Curran Associates Inc., Vancouver, 6-12 December 2020. https://doi.org/10.48550/arXiv.2005.14165