Multi-Level Training and Testing of CNN Models in Diagnosing Multi-Center COVID-19 and Pneumonia X-ray Images
-
Published:2023-09-13
Issue:18
Volume:13
Page:10270
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Talaat Mohamed1, Si Xiuhua2ORCID, Xi Jinxiang1ORCID
Affiliation:
1. Department of Biomedical Engineering, University of Massachusetts, Lowell, MA 01854, USA 2. Department of Mechanical Engineering, California Baptist University, Riverside, CA 92504, USA
Abstract
This study aimed to address three questions in AI-assisted COVID-19 diagnostic systems: (1) How does a CNN model trained on one dataset perform on test datasets from disparate medical centers? (2) What accuracy gains can be achieved by enriching the training dataset with new images? (3) How can learned features elucidate classification results, and how do they vary among different models? To achieve these aims, four CNN models—AlexNet, ResNet-50, MobileNet, and VGG-19—were trained in five rounds by incrementally adding new images to a baseline training set comprising 11,538 chest X-ray images. In each round, the models were tested on four datasets with decreasing levels of image similarity. Notably, all models showed performance drops when tested on datasets containing outlier images or sourced from other clinics. In Round 1, 95.2~99.2% accuracy was achieved for the Level 1 testing dataset (i.e., from the same clinic but set apart for testing only), and 94.7~98.3% for Level 2 (i.e., from an external clinic but similar). However, model performance drastically decreased for Level 3 (i.e., outlier images with rotation or deformation), with the mean sensitivity plummeting from 99% to 36%. For the Level 4 testing dataset (i.e., from another clinic), accuracy decreased from 97% to 86%, and sensitivity from 99% to 67%. In Rounds 2 and 3, adding 25% and 50% of the outlier images to the training dataset improved the average Level-3 accuracy by 15% and 23% (i.e., from 56% to 71% to 83%). In Rounds 4 and 5, adding 25% and 50% of the external images increased the average Level-4 accuracy from 81% to 92% and 95%, respectively. Among the models, ResNet-50 demonstrated the most robust performance across the five-round training/testing phases, while VGG-19 persistently underperformed. Heatmaps and intermediate activation features showed visual correlations to COVID-19 and pneumonia X-ray manifestations but were insufficient to explicitly explain the classification. However, heatmaps and activation features at different rounds shed light on the progression of the models’ learning behavior.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference60 articles.
1. Fusco, R., Grassi, R., Granata, V., Setola, S.V., Grassi, F., Cozzi, D., Pecori, B., Izzo, F., and Petrillo, A. (2021). Artificial intelligence and COVID-19 using chest CT scan and chest X-ray images: Machine learning and deep learning approaches for diagnosis and treatment. J. Pers. Med., 11. 2. Ddiagnosis of COVID-19 using machine learning and deep learning: A review;Mondal;Curr. Med. Imaging,2021 3. A review on deep learning techniques for the diagnosis of novel coronavirus (COVID-19);Islam;IEEE Access,2021 4. Detection of COVID-19 using deep learning techniques and cost effectiveness evaluation: A survey;MV;Front. Artif. Intell.,2022 5. Awassa, L., Jdey, I., Dhahri, H., Hcini, G., Mahmood, A., Othman, E., and Haneef, M. (2022). Study of different deep learning methods for coronavirus (COVID-19) pandemic: Taxonomy, survey and insights. Sensors, 22.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|