Affiliation:
1. Faculty of Economics and Business Administration Sofia University St. Klimenr Ohridski
Abstract
Data-centric is a newly explored concept, where the attention is given to dataoptimization methodologies and techniques to improve model performance, rather thanfocusing on machine learning models and hyperparameter tunning. This paper suggestsan effective data optimization methodology for optimizing imbalanced small datasetsthat improves machine learning model performance.This paper is focused on providing an effective solution when the number ofobservations is not enough to construct a machine learning model with high values ofthe estimated magnitudes. For example, the majority of the observations are labeled asone class (majority class), and the rest as the other, commonly considered as the classof interest (minority class). The proposed methodology does not depend on the appliedclassification models, rather it is based on the properties of the data resamplingapproach to systematically enhance and optimize the training dataset. The paperexamines numerical experiments applying the data centric optimization methodology,and compares with previously obtained results by other authors.
Publisher
Faculty of Organisation and Informatics