Boosting meta-learning with simulated data complexity measures-Reference-Cited by-同舟云学术

Boosting meta-learning with simulated data complexity measures

Published:2020-09-30 Issue:5 Volume:24 Page:1011-1028
ISSN:1088-467X
Container-title:Intelligent Data Analysis
language:
Short-container-title:IDA

Author:

Garcia Luís P.F.¹,Rivolli Adriano²,Alcoba Edesio³,Lorena Ana C.⁴,de Carvalho André C.P.L.F.³

Affiliation:

1. Department of Computer Science, University of Brasília, Brasília, Brazil

2. Computing Department, Technological University of Paraná, Paraná, Brazil

3. Institute of Mathematical and Computer Sciences, University of São Paulo, São Paulo, Brazil

4. Aeronautics Institute of Technology, Praça Marechal Eduardo Gomes, São Paulo, Brazil

Abstract

Meta-Learning has been largely used over the last years to support the recommendation of the most suitable machine learning algorithm(s) and hyperparameters for new datasets. Traditionally, a meta-base is created containing meta-features extracted from several datasets along with the performance of a pool of machine learning algorithms when applied to these datasets. The meta-features must describe essential aspects of the dataset and distinguish different problems and solutions. However, if one wants the use of Meta-Learning to be computationally efficient, the extraction of the meta-feature values should also show a low computational cost, considering a trade-off between the time spent to run all the algorithms and the time required to extract the meta-features. One class of measures with successful results in the characterization of classification datasets is concerned with estimating the underlying complexity of the classification problem. These data complexity measures take into account the overlap between classes imposed by the feature values, the separability of the classes and distribution of the instances within the classes. However, the extraction of these measures from datasets usually presents a high computational cost. In this paper, we propose an empirical approach designed to decrease the computational cost of computing the data complexity measures, while still keeping their descriptive ability. The proposal consists of a novel Meta-Learning system able to predict the values of the data complexity measures for a dataset by using simpler meta-features as input. In an extensive set of experiments, we show that the predictive performance achieved by Meta-Learning systems which use the predicted data complexity measures is similar to the performance obtained using the original data complexity measures, but the computational cost involved in their computation is significantly reduced.

Publisher

IOS Press

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Theoretical Computer Science

Reference50 articles.

1. V.H. Barella, L.P.F. Garcia, M.P. de Souto, A.C. Lorena and A.C.P.L.F. de Carvalho, Data complexity measures for imbalanced classification tasks, In International Joint Conference on Neural Networks (IJCNN), volume 1, 2018, pp. 1–8.

2. H. Bensusan, C. Giraud-Carrier and C. Kennedy, A higher-order approach to meta-learning, Technical report, University of Bristol, 2000.

3. H. Bensusan and A. Kalousis, Estimating the predictive accuracy of a classifier. In 12th European Conference on Machine Learning (ECML), volume 2167, 2001, pp. 25–36.

4. P. Brazdil, C. Giraud-Carrier, C. Soares and R. Vilalta, Metalearning – Applications to Data Mining, Cognitive Technologies. Springer, 1 edition, 2009.

5. Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results;Brazdil;Machine Learning,2003

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks;ACM Transactions on Software Engineering and Methodology;2024-06-27

2. Investigating the Performance of Data Complexity & Instance Hardness Measures as A Meta-Feature in Overlapping Classes Problem;Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing;2023-08-17

3. Feature Selection for Portable Spectral Sensing Data of Soil Using Broad Learning Network in Fusion with Fuzzy Technique;IEEE Sensors Journal;2023

4. A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research;Information Fusion;2023-01

5. Dynamic selection of classifiers based on complexity measures;2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI);2022-10