Abstract
The main idea of this work is the use of machine learning of BAGGING or Bootstrap AGGregatiING, which is extended to average the classifiers based on a distance function. The idea of this function is to find the shortest distance from each data point to the classification boundary by using ‘Manhattan’ distance in decision trees and alternative distance measure is the ‘Mahalanobis’ distance used for Linear Discriminant Analysis or LDA, called modified bagging in this work. Thus providing a weighted voting system instead of equal weight voting, the classification error is reduced. Modified bagging is a viable option to reduce the variance which is a component of the classification error. Referring to the analysis, we conclude that modified bagging gives statistically significant improvement in Ripley’s data set with different bootstrap sample sizes.
Reference9 articles.
1. MacQueen J. Some methods for classification and analysis of multivariate observations. In: Lecam LM, Neyman J, editors. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, CA: California Unversity Press; 1967. pp. 281-297
2. Fisher RA. The use of multiple measurements in taxonomic problems. Annals of Eugenics. 1936;7:179-180
3. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning – Data mining, inference, and prediction. In: Springer Series in Statistics. New York, NY: Springer; 2017
4. Breiman L. Bagging predictors. Machine Learning. 1996a;26:123-140
5. Efron B, Tibhirani R. An Introduction to the Bootstrap. London: Chapman & Hall; 1993