Affiliation:
1. School of Computer Science, McGill University, Canada
2. Epidemiology & Biostatistics, McGill University, Canada
3. IIMAS-UNAM, Ciudad de Mexico, Mexico
Abstract
Clustering is considered the most important aspect of unsupervised learning in data mining. It deals with finding structure in a collection of unlabeled data. One simple way of defining clustering is as follows: the process of organizing data elements into groups, called clusters, whose members are similar to each other in some way. Several algorithms for clustering exist (Gan, Ma, & Wu, 2007); proximity-graph-based ones, which are untraditional from the point of view of statisticians, emanate from the field of computational geometry and are powerful and often elegant (Bhattacharya, Mukherjee, & Toussaint, 2005). A proximity graph is a graph formed from a collection of elements, or points, by connecting with an edge those pairs of points that satisfy a particular neighbor relationship with each other. One key aspect of proximity-graph-based clustering techniques is that they may allow for an easy and clear visualization of data clusters, given their geometric nature. Proximity graphs have been shown to improve typical instance-based learning algorithms such as the k-nearest neighbor classifiers in the typical nonparametric approach to classification (Bhattacharya, Mukherjee, & Toussaint, 2005). Furthermore, the most powerful and robust methods for clustering turn out to be those based on proximity graphs (Koren, North, & Volinsky, 2006). Many examples have been shown where proximity-graph-based methods perform very well when traditional methods fail miserably (Zahn, 1971; Choo, Jiamthapthaksin, Chen, Celepcikay, Giusti, & Eick, 2007).
Reference22 articles.
1. Bhattacharya, B., Mukherjee, K., & Toussaint, G. T. (2005). Geometric decision rules for high dimensions. In Proceedings of the 55th Session of the International Statistical Institute. Sydney, Australia.
2. Chazelle, B., Edelsbrunner, H., Guibas, L. J., Hershberger, J. E., Seidel, R., & Sharir, M. (1990). Slimming down by adding; selecting heavily covered points. Proceedings of the sixth annual symposium on Computational Geometry (pp. 116-127). Berkley, California, United States.
3. Choo, J., Jiamthapthaksin, R., Chen, C., Celepcikay, O. U., Giusti, C., & Eick, C. F. (2007). MOSAIC: A Proximity Graph Approach for Agglomerative Clustering. In Data Warehousing and Knowledge Discovery of Lecture Notes in Computer Science (pp. 231-240). Regensburg, Germany: Springer Berlin / Heidelberg.
4. An algorithm for computing the restriction scaffold assignment problem in computational biology
5. Nearest Neighbour Editing and Condensing Tools–Synergy Exploitation