Biased Deep Learning Methods in Detection of COVID-19 Using CT Images: A Challenge Mounted by Subject-Wise-Split ISFCT Dataset

Author:

Parsarad Shiva12,Saeedizadeh Narges13,Soufi Ghazaleh Jamalipour4,Shafieyoon Shamim4,Hekmatnia Farzaneh5,Zarei Andrew Parviz5,Soleimany Samira4,Yousefi Amir4,Nazari Hengameh4,Torabi Pegah4,S. Milani Abbas6ORCID,Madani Tonekaboni Seyed Ali7,Rabbani Hossein1,Hekmatnia Ali4,Kafieh Rahele18ORCID

Affiliation:

1. Medical Image and Signal Processing Research Center, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan JM76+5M3, Iran

2. Law, Economics, and Data Science Group, Department of Humanities, Social and Political Science, ETH Zurich, 8092 Zurich, Switzerland

3. Institute for Intelligent Systems Research and Innovation, Deakin University, Melbourne, VIC 3125, Australia

4. Department of Radiology, School of Medicine, Isfahan University of Medical Sciences, Isfahan JM76+5M3, Iran

5. St. George’s Hospital, London SW17 0RE, UK

6. School of Engineering, University of British Columbia, Kelowna, BC V1V 1V7, Canada

7. Cyclica Inc., Toronto, ON M5J 1A7, Canada

8. Department of Engineering, Durham University, Durham DH1 3LE, UK

Abstract

Accurate detection of respiratory system damage including COVID-19 is considered one of the crucial applications of deep learning (DL) models using CT images. However, the main shortcoming of the published works has been unreliable reported accuracy and the lack of repeatability with new datasets, mainly due to slice-wise splits of the data, creating dependency between training and test sets due to shared data across the sets. We introduce a new dataset of CT images (ISFCT Dataset) with labels indicating the subject-wise split to train and test our DL algorithms in an unbiased manner. We also use this dataset to validate the real performance of the published works in a subject-wise data split. Another key feature provides more specific labels (eight characteristic lung features) rather than being limited to COVID-19 and healthy labels. We show that the reported high accuracy of the existing models on current slice-wise splits is not repeatable for subject-wise splits, and distribution differences between data splits are demonstrated using t-distribution stochastic neighbor embedding. We indicate that, by examining subject-wise data splitting, less complicated models show competitive results compared to the exiting complicated models, demonstrating that complex models do not necessarily generate accurate and repeatable results.

Funder

Vice Chancellery for Research and Technology, Isfahan University of Medical Sciences

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Vision and Pattern Recognition,Radiology, Nuclear Medicine and imaging

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3