Abstract
Abstract
Background
Selecting the k best features is a common task in machine-learning. Typically, a few variables have high importance, but many have low importance (right skewed distribution). This report proposes a numerically precise method to address this skewed feature importance distribution to reduce a feature set to the informative minimum of items.
Methods
Computed ABC analysis (cABC) is an item categorization method that aims to identify the most important elements by dividing a set of non-negative numerical elements into subsets "A", "B" and "C" such that subset "A" contains the "few important " items based on specific properties of ABC curves defined by their relationship to Lorenz curves. In its recursive form, the cABC analysis can be applied again to subset "A". A generic image data set and three biomedical datasets (lipidomics and two genomics datasets) with a large number of variables were used to perform the experiments.
Results
Experimental results show that recursive cABC analysis limits dimensions of data projection to a minimum where the relevant information is still preserved and directs feature selection in machine learning to the most important class-relevant information including filtering feature sets for nonsense variables. Feature sets were reduced to 10% or less of the original variables and still provided accurate classification in data unused for feature selection.
Conclusions
cABC analysis, in its recursive variant, provides a computational precise defined means of reducing information to a minimum. The minimum is the result of a computation of the number of k most relevant items rather than of a decision to select the k best items from a list. Furthermore, precise criteria for stopping the reduction process are available. The reduction to the most important features can increase human comprehension of the properties of the data set. The cABC method is implemented in the Python package "cABCanalysis" available at https://pypi.org/project/cABCanalysis/.
Publisher
Research Square Platform LLC