Affiliation:
1. University of Western Ontario, London, Canada
2. University of Waterloo, Waterloo, Canada
Abstract
The k nearest neighbor (kNN) approach is a simple and effective nonparametric algorithm for classification. One of the drawbacks of kNN is that the method can only give coarse estimates of class probabilities, particularly for low values of k. To avoid this drawback, we propose a new nonparametric classification method based on nearest neighbors conditional on each class: the proposed approach calculates the distance between a new instance and the kth nearest neighbor from each class, estimates posterior probabilities of class memberships using the distances, and assigns the instance to the class with the largest posterior. We prove that the proposed approach converges to the Bayes classifier as the size of the training data increases. Further, we extend the proposed approach to an ensemble method. Experiments on benchmark data sets show that both the proposed approach and the ensemble version of the proposed approach on average outperform kNN, weighted kNN, probabilistic kNN and two similar algorithms (LMkNN and MLM-kHNN) in terms of the error rate. A simulation shows that kCNN may be useful for estimating posterior probabilities when the class distributions overlap.
Funder
Social Sciences and Humanities Research Council of Canada
Reference30 articles.
1. Shape matching and object recognition using shape contexts;Belongie;IEEE Transactions on Pattern Analysis and Machine Intelligence,2002
2. Survey of nearest neighbor techniques;Bhatia;International Journal of Computer Science and Information Security,2010
3. Random forests;Breiman;Machine Learning,2001
4. Statistical comparisons of classifiers over multiple data sets;Demšar;Journal of Machine Learning Research,2006
5. The distance-weighted k-nearest-neighbor rule;Dudani;IEEE Transactions on Systems, Man, and Cybernetics,1976
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献