Affiliation:
1. Department of Computer Engineering, Uludag University, Gorukle Kampusu, Bursa 16059, Turkey
Abstract
In this study, we consider unsupervised learning from multi-dimensional dataset problem. Particularly, we consider [Formula: see text]-means clustering which require long duration time during execution of multi-dimensional datasets. In order to speed up clustering in an accurate form, we introduce a new algorithm, that we term Canopy[Formula: see text]. The algorithm utilizes canopies and statistical techniques. Also, its efficient initiation and normalization methodologies contributes to the improvement. Furthermore, we consider early termination cases of clustering computation, provided that an intermediate result of the computation is accurate enough. We compared our algorithm with four popular clustering algorithms. Results denote that our algorithm speeds up the clustering computation by at least 2X. Also, we analyzed the contribution of early termination. Results present that further 2X improvement can be obtained while incurring 0.1% error rate. We also observe that our Canopy[Formula: see text] algorithm benefits from early termination and introduces extra 1.2X performance improvement.
Publisher
World Scientific Pub Co Pte Lt
Subject
Computer Science (miscellaneous),Computer Science (miscellaneous)
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献