Author:
Ali Najat,Neagu Daniel,Trundle Paul
Abstract
Abstract
Distance-based algorithms are widely used for data classification problems. The k-nearest neighbour classification (k-NN) is one of the most popular distance-based algorithms. This classification is based on measuring the distances between the test sample and the training samples to determine the final classification output. The traditional k-NN classifier works naturally with numerical data. The main objective of this paper is to investigate the performance of k-NN on heterogeneous datasets, where data can be described as a mixture of numerical and categorical features. For the sake of simplicity, this work considers only one type of categorical data, which is binary data. In this paper, several similarity measures have been defined based on a combination between well-known distances for both numerical and binary data, and to investigate k-NN performances for classifying such heterogeneous data sets. The experiments used six heterogeneous datasets from different domains and two categories of measures. Experimental results showed that the proposed measures performed better for heterogeneous data than Euclidean distance, and that the challenges raised by the nature of heterogeneous data need personalised similarity measures adapted to the data characteristics.
Publisher
Springer Science and Business Media LLC
Subject
General Earth and Planetary Sciences,General Physics and Astronomy,General Engineering,General Environmental Science,General Materials Science,General Chemical Engineering
Reference53 articles.
1. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
2. Shavlik JW, Dietterich T, Dietterich TG (1990) Readings in machine learning. Morgan Kaufmann, Los Altos
3. Cover TM, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
4. Tan P-N (2018) Introduction to data mining. Pearson Education, Chennai
5. Wettschereck D (1994) A study of distance-based machine learning algorithms
Cited by
128 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献