Handling Imbalance and Limited Data in Thyroid Ultrasound and Diabetic Retinopathy Datasets Using Discrete Levy Flights Grey Wolf Optimizer Based Random Forest for Robust Medical Data Classification
-
Published:2024-02-16
Issue:
Volume:
Page:
-
ISSN:2375-4699
-
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
-
language:en
-
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.
Author:
Aswal Shobha1ORCID, Ahuja Neelu Jyothi2ORCID, Mehra Ritika3ORCID
Affiliation:
1. Research Scholar, Computer Science and Engineering, VMSB Uttarakhand Technical University, Dehradun, India 2. School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India 3. School of Computer Science and Engineering, Dev Bhoomi Uttarakhand University, Dehradun, India
Abstract
In the field of disease diagnosis, medical image classification faces an inherent challenge due to various factors involving data imbalance, image quality variability, annotation variability, and limited data availability and data representativeness. Such challenges affect the algorithm's classification ability on the medical images in an adverse way, which leads to biased model outcomes and inaccurate interpretations. In this paper, a novel Discrete Levy Flight Grey Wolf Optimizer (DLFGWO) is combined with the Random Forest (RF) classifier to address the above limitations on the biomedical datasets and to achieve better classification rate. The DLFGWO-RF resolves the image quality variability in ultrasound images and limits the inaccuracies on classification using RF by handling the incomplete and noisy data. The sheer focus on the majority class may lead to unequal distribution of classes and thus leads to data imbalance. The DLFGWO balances such distribution by leveraging grey wolves and its exploration and exploitation capabilities are improved using Discrete Levy Flight (DLF). It further optimizes the classifier's performance to achieve balanced classification rate. DLFGWO-RF is designed to perform classification even on limited datasets, thereby the requirement of numerous expert annotations can thus be reduced. In diabetic retinopathy grading, the DLFGWO-RF reduces disagreements in annotation variability using subjective interpretations. However, the representativeness of the diabetic retinopathy dataset fails to capture the entire population diversity, which limits the generalization ability of the proposed DLFGWO-RF. Thus, fine-tuning of RF can robustly adapt to the subgroups in the dataset, enhancing its overall performance. The experiments are conducted on two widely used medical image datasets to test the efficacy of the model. The experimental results show that the DLFGWO-RF classifier achieves improved classification accuracy between 90-95%, which outperforms the existing techniques for various imbalanced datasets.
Publisher
Association for Computing Machinery (ACM)
Reference28 articles.
1. Deep learning for medical image processing: Overview, challenges and the future;Razzak M. I.;Classification in BioApps: Automation of Decision Making,2018 2. Kumar, V., Gu, Y., Basu, S., Berglund, A., Eschrich, S. A., Schabath, M. B., ... & Gillies, R. J. (2012). Radiomics: the process and the challenges. Magnetic resonance imaging, 30(9), 1234-1248. 3. Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets (Vol. 10, pp. 978-3). Cham: Springer. 4. FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning 5. What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams
|
|