Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images-Reference-Cited by-同舟云学术

Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images

Published:2022-09-22 Issue:1 Volume:9 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

Tampu Iulian Emil^ORCID,Eklund Anders^ORCID,Haj-Hosseini Neda^ORCID

Abstract

AbstractIn the application of deep learning on optical coherence tomography (OCT) data, it is common to train classification networks using 2D images originating from volumetric data. Given the micrometer resolution of OCT systems, consecutive images are often very similar in both visible structures and noise. Thus, an inappropriate data split can result in overlap between the training and testing sets, with a large portion of the literature overlooking this aspect. In this study, the effect of improper dataset splitting on model evaluation is demonstrated for three classification tasks using three OCT open-access datasets extensively used, Kermany’s and Srinivasan’s ophthalmology datasets, and AIIMS breast tissue dataset. Results show that the classification performance is inflated by 0.07 up to 0.43 in terms of Matthews Correlation Coefficient (accuracy: 5% to 30%) for models tested on datasets with improper splitting, highlighting the considerable effect of dataset handling on model evaluation. This study intends to raise awareness on the importance of dataset splitting given the increased research interest in implementing deep learning on OCT data.

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability

Link

https://www.nature.com/articles/s41597-022-01618-6.pdf

Reference44 articles.

1. Xu, Y. & Goodacre, R. On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning. Journal of analysis and testing 2, 249–262 (2018).

2. Kuhn, M., et al. Applied predictive modeling, vol. 26 (Springer, 2013).

3. Guyon, I. et al. A scaling law for the validation-set training-set size ratio. AT&T Bell Laboratories 1 (1997).

4. Refaeilzadeh, P., Tang, L. & Liu, H. Cross-validation. Encyclopedia of database systems 5, 532–538 (2009).

5. Litjens, G. et al. A survey on deep learning in medical image analysis. Medical image analysis 42, 60–88 (2017).

Cited by 35 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AI chatbots show promise but limitations on UK medical exam questions: a comparative performance study;Scientific Reports;2024-08-14

3. Comparison of Deep and Machine Learning Approaches for Quebec Tree Species Classification Using a Combination of Multispectral and LiDAR Data;Canadian Journal of Remote Sensing;2024-06-11

4. An investigation of machine learning algorithms and data augmentation techniques for diabetes diagnosis using class imbalanced BRFSS dataset;Healthcare Analytics;2024-06

5. Applying oversampling before cross-validation will lead to high bias in radiomics;Scientific Reports;2024-05-21