Author:
Kannammal A ,Sindhu P ,Santhiya R ,Sujitha S ,Yuvetha S
Abstract
The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, is common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.
Reference15 articles.
1. [1] P.K. Agarwal and C.M. Procopiuc, ªExact and Approximation Algorithms for Clustering,º Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 658-667, Jan. 1998.
2. [2] K. Alsabti, S. Ranka, and V. Singh, ªAn Efficient k-means Clustering Algorithm,º Proc. First Workshop High Performance Data Mining, Mar. 1998.
3. [3] S. Arora, P. Raghavan, and S. Rao, ªApproximation Schemes for Euclidean k-median and Related Problems,º Proc. 30th Ann. ACM Symp. Theory of Computing, pp. 106-113, May 1998.
4. [4] S. Arya and D. M. Mount, ªApproximate Range Searching,º Computational Geometry: Theory and Applications, vol. 17, pp. 135- 163, 2000.
5. [5] S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, and A.Y. Wu, ªAn Optimal Algorithm for Approximate Nearest Neighbor Searching,º J. ACM, vol. 45, pp. 891-923, 1998.