ADQE: Obtain Better Deep Learning Models by Evaluating the Augmented Data Quality Using Information Entropy
-
Published:2023-09-28
Issue:19
Volume:12
Page:4077
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Cui Xiaohui12ORCID, Li Yu12, Xie Zheng12, Liu Hanzhang1, Yang Shijie1, Mou Chao12ORCID
Affiliation:
1. School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China 2. Engineering Research Center for Forestry-Oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing 100083, China
Abstract
Data augmentation, as a common technique in deep learning training, is primarily used to mitigate overfitting problems, especially with small-scale datasets. However, it is difficult for us to evaluate whether the augmented dataset truly benefits the performance of the model. If the training model is relied upon in each case to validate the quality of the data augmentation and the dataset, it will take a lot of time and resources. This article proposes a simple and practical approach to evaluate the quality of data augmentation for image classification tasks, enriching the theoretical research on data augmentation quality evaluation. Based on the information entropy, multiple dimensional metrics for data quality augmentation are established, including diversity, class balance, and task relevance. Additionally, a comprehensive data augmentation quality fusion metric is proposed. Experimental results on the CIFAR-10 and CUB-200 datasets show that our method maintains optimal performance in a variety of scenarios. The cosine similarity between the score of our method and the precision of model is up to 99.9%. A rigorous evaluation of data augmentation quality is necessary to guide the improvement of DL model performance. The quality standards and evaluation defined in this article can be utilized by researchers to train high-performance DL models in situations where data are limited.
Funder
Outstanding Youth Team Project of Central Universities
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference48 articles.
1. Intelligent fault diagnosis of machines with small & imbalanced data: A state-of-the-art review and possible extensions;Zhang;ISA Trans.,2022 2. A review of medical image data augmentation techniques for deep learning applications;Chlap;J. Med. Imaging Radiat. Oncol.,2021 3. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play;Silver;Science,2018 4. Hao, X., Liu, L., Yang, R., Yin, L., Zhang, L., and Li, X. (2023). A Review of Data Augmentation Methods of Remote Sensing Image Target Recognition. Remote Sens., 15. 5. Chen, Y., Yang, X.H., Wei, Z., Heidari, A.A., Zheng, N., Li, Z., Chen, H., Hu, H., Zhou, Q., and Guan, Q. (2022). Generative adversarial networks in medical image augmentation: A review. Comput. Biol. Med., 144.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|