Improving Active Learning Performance through the Use of Data Augmentation-Reference-Cited by-同舟云学术

Improving Active Learning Performance through the Use of Data Augmentation

Published:2023-02-20 Issue: Volume:2023 Page:1-17
ISSN:1098-111X
Container-title:International Journal of Intelligent Systems
language:en
Short-container-title:International Journal of Intelligent Systems

Author:

Fonseca Joao¹^ORCID,Bacao Fernando¹^ORCID

Affiliation:

1. NOVA Information Management School, Universidade Nova de Lisboa, Lisbon, Portugal

Abstract

Active learning (AL) is a well-known technique to optimize data usage in training, through the interactive selection of unlabeled observations, out of a large pool of unlabeled data, to be labeled by a supervisor. Its focus is to find the unlabeled observations that, once labeled, will maximize the informativeness of the training dataset, therefore reducing data-related costs. The literature describes several methods to improve the effectiveness of this process. Nonetheless, there is a paucity of research developed around the application of artificial data sources in AL, especially outside image classification or NLP. This paper proposes a new AL framework, which relies on the effective use of artificial data. It may be used with any classifier, generation mechanism, and data type and can be integrated with multiple other state-of-the-art AL contributions. This combination is expected to increase the ML classifier’s performance and reduce both the supervisor’s involvement and the amount of required labeled data at the expense of a marginal increase in computational time. The proposed method introduces a hyperparameter optimization component to improve the generation of artificial instances during the AL process as well as an uncertainty-based data generation mechanism. We compare the proposed method to the standard framework and an oversampling-based active learning method for more informed data generation in an AL context. The models’ performance was tested using four different classifiers, two AL-specific performance metrics, and three classification performance metrics over 15 different datasets. We demonstrated that the proposed framework, using data augmentation, significantly improved the performance of AL, both in terms of classification performance and data selection efficiency (all the codes and preprocessed data developed for this study are available at https://github.com/joaopfonseca/publications/).

Funder

Fundação para a Ciência e a Tecnologia

Publisher

Hindawi Limited

Subject

Artificial Intelligence,Human-Computer Interaction,Theoretical Computer Science,Software

Link

http://downloads.hindawi.com/journals/ijis/2023/7941878.pdf

Reference63 articles.

1. Diminishing Uncertainty Within the Training Pool: Active Learning for Medical Image Segmentation

2. A review of active learning approaches to experimental design for uncovering biological networks

3. Active Learning for Hierarchical Text Classification

4. SEAL: Semisupervised Adversarial Active Learning on Attributed Graphs

5. Rethinking deep active learning: using unlabeled data at model training;O. Simfffdfffdoni,2020

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring Data Augmentation and Active Learning Benefits in Imbalanced Datasets;Mathematics;2024-06-19

2. Active Learning with Aggregated Uncertainties from Image Augmentations;Communications in Computer and Information Science;2024

3. Optimizing Sustainability: A Deep Learning Approach on Data Augmentation of Indonesia Palm Oil Products Emission;2023-12-06