Hostility measure for multi-level study of data complexity-Reference-Cited by-同舟云学术

Hostility measure for multi-level study of data complexity

Published:2022-07-26 Issue:7 Volume:53 Page:8073-8096
ISSN:0924-669X
Container-title:Applied Intelligence
language:en
Short-container-title:Appl Intell

Author:

Lancho Carmen^ORCID,Martín De Diego Isaac,Cuesta Marina,Aceña Víctor,Moguerza Javier M.

Abstract

AbstractComplexity measures aim to characterize the underlying complexity of supervised data. These measures tackle factors hindering the performance of Machine Learning (ML) classifiers like overlap, density, linearity, etc. The state-of-the-art has mainly focused on the dataset perspective of complexity, i.e., offering an estimation of the complexity of the whole dataset. Recently, the instance perspective has also been addressed. In this paper, the hostility measure, a complexity measure offering a multi-level (instance, class, and dataset) perspective of data complexity is proposed. The proposal is built by estimating the novel notion of hostility: the difficulty of correctly classifying a point, a class, or a whole dataset given their corresponding neighborhoods. The proposed measure is estimated at the instance level by applying the k-means algorithm in a recursive and hierarchical way, which allows to analyze how points from different classes are naturally grouped together across partitions. The instance information is aggregated to provide complexity knowledge at the class and the dataset levels. The validity of the proposal is evaluated through a variety of experiments dealing with the three perspectives and the corresponding comparative with the state-of-the-art measures. Throughout the experiments, the hostility measure has shown promising results and to be competitive, stable, and robust.

Funder

Universidad Rey Juan Carlos

Comunidad de Madrid

Ministerio de Ciencia, Innovación y Universidades

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence

Link

https://link.springer.com/content/pdf/10.1007/s10489-022-03793-w.pdf

Reference39 articles.

1. Arruda J L, Prudêncio R B, Lorena A C (2020) Measuring instance hardness using data complexity measures. In: Brazilian conference on intelligent systems. Springer, pp 483–497