Exploring the influence of transformer-based multimodal modeling on clinicians’ diagnosis of skin diseases: A quantitative analysis-Reference-Cited by-同舟云学术

Exploring the influence of transformer-based multimodal modeling on clinicians’ diagnosis of skin diseases: A quantitative analysis

Published:2024-01 Issue: Volume:10 Page:
ISSN:2055-2076
Container-title:DIGITAL HEALTH
language:en
Short-container-title:DIGITAL HEALTH

Author:

Zhang Yujiao¹,Hu Yunfeng¹,Li Ke²,Pan Xiangjun¹,Mo Xiaoling¹,Zhang Hong¹^ORCID

Affiliation:

1. Department of Dermatology, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong, China

2. School of the First Clinical Medicine, Wenzhou Medical University, Wenzhou, Zhejiang, China

Abstract

Objectives The study aimed to propose a multimodal model that incorporates both macroscopic and microscopic images and analyze its influence on clinicians’ decision-making with different levels of experience. Methods First, we constructed a multimodal dataset for five skin disorders. Next, we trained unimodal models on three different types of images and selected the best-performing models as the base learners. Then, we used a soft voting strategy to create the multimodal model. Finally, 12 clinicians were divided into three groups, with each group including one director dermatologist, one dermatologist-in-charge, one resident dermatologist, and one general practitioner. They were asked to diagnose the skin disorders in four unaided situations (macroscopic images only, dermatopathological images only, macroscopic and dermatopathological images, all images and metadata), and three aided situations (macroscopic images with model 1 aid, dermatopathological images with model 2&3 aid, all images with multimodal model 4 aid). The clinicians’ diagnosis accuracy and time for each diagnosis were recorded. Results Among the trained models, the vision transformer (ViT) achieved the best performance, with accuracies of 0.8636, 0.9545, 0.9673, and AUCs of 0.9823, 0.9952, 0.9989 on the training set, respectively. However, on the external validation set, they only achieved accuracies of 0.70, 0.90, and 0.94, respectively. The multimodal model performed well compared to the unimodal models, achieving an accuracy of 0.98 on the external validation set. The results of logit regression analysis indicate that all models are helpful to clinicians in making diagnostic decisions [Odds Ratios (OR) > 1], while metadata does not provide assistance to clinicians (OR < 1). Linear analysis results indicate that metadata significantly increases clinicians’ diagnosis time ( P < 0.05), while model assistance does not ( P > 0.05). Conclusions The results of this study suggest that the multimodal model effectively improves clinicians’ diagnostic performance without significantly increasing the diagnostic time. However, further large-scale prospective studies are necessary.

Funder

Guangzhou Municipal Science and Technology Bureau

Jinan University

Publisher

SAGE Publications

Link

https://journals.sagepub.com/doi/pdf/10.1177/20552076241257087

Reference46 articles.

1. Mental health impairment among children with atopic dermatitis: A United States population-based cross-sectional study of the 2013-2017 National Health Interview Survey

2. The burden of skin and subcutaneous diseases: findings from the global burden of disease study 2019

3. Melanoma Prognosis and Associated Risk Factors: A Retrospective Cohort Study Using Semantic Map Analysis

4. A Review of the Clinical Variants and the Management of Psoriasis