Abstract
Data quantity of Big Data was too big to be processed with traditional clustering analysis technologies. Time consuming was long, problem of computability existed with traditional technologies. Having analyzed on k-means clustering algorithm, a new algorithm was proposed. Parallelizing part of k-means was found. The algorithm was improved with the method of redesigning flow with MapReduce framework. Problems mentioned above were solved. Experiments show that new algorithm is feasible and effective.
Publisher
Trans Tech Publications, Ltd.
Reference4 articles.
1. Ralf Lammel, Data Programmability Team. Google's MapReduce Programmig Model-Revisited. Redmond, WA, USA: Microsoft Corp. (2007).
2. Jeffrey Dean, Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM, vol. 51, no . 1(2008), pp . 107-113.
3. Hadoop Community. Hadoop Distributed File System, http: /hadoop. apache. org/hdfs (2010).
4. J. A. Hartigan and M. A. Wong. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, No. 1 (1979), pp.100-108.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献