Affiliation:
1. Department of Information Systems, Faculty of Informatics, Kaunas University of Technology, 44249 Kaunas, Lithuania
Abstract
This study introduces a novel performance-based weighting scheme for ensemble learning using the Shapley value. The weighting uses the reciprocal of binary cross-entropy as a base learner’s performance metric and estimates its Shapley value to measure the overall contribution of a learner to an equally weighted ensemble of various sizes. Two variants of this strategy were empirically compared with a single monolith model and other static weighting strategies using two large banking-related datasets. A variant that discards learners with a negative Shapley value was ranked as first or at least second when constructing homogeneous ensembles, whereas for heterogeneous ensembles this strategy resulted in a better or at least similar detection performance to other weighting strategies tested. The main limitation being the computational complexity of Shapley calculations, the explored weighting strategy could be considered as a generalization of performance-based weighting.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference64 articles.
1. The information catastrophe;Vopson;AIP Adv.,2020
2. Federated learning: Challenges, methods, and future directions;Li;IEEE Signal Process. Mag.,2020
3. The strength of weak learnability;Schapire;Mach. Learn.,1990
4. A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities;Rokach;Inf. Fusion,2020
5. Fan, W., Stolfo, S.J., and Zhang, J. (1999, January 15–18). The application of AdaBoost for distributed, scalable and on-line learning. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA.