Affiliation:
1. Utkal University, India
Abstract
The Big Data, due to its complicated and diverse nature, poses a lot of challenges for extracting meaningful observations. This sought smart and efficient algorithms that can deal with computational complexity along with memory constraints out of their iterative behavior. This issue may be solved by using parallel computing techniques, where a single machine or a multiple machine can perform the work simultaneously, dividing the problem into sub problems and assigning some private memory to each sub problems. Clustering analysis are found to be useful in handling such a huge data in the recent past. Even though, there are many investigations in Big data analysis are on, still, to solve this issue, Canopy and K-Means++ clustering are used for processing the large-scale data in shorter amount of time with no memory constraints. In order to find the suitability of the approach, several data sets are considered ranging from small to very large ones having diverse filed of applications. The experimental results opine that the proposed approach is fast and accurate.
Reference38 articles.
1. A comparative study of instance reduction algorithm.;P.Arora;International Journal of Advances in Engineering Sciences,2013
2. Arthur, D., & Vassilvitskii, S. (2007). K-means++: the advantage of careful seeding. Proc. Of 18th annual ACM-SIAM symposium on discrete algorithms, Society for Industrial and applied mathematics, Philadlphia, PA, USA, 1027-1035.
3. Ensembles of Restricted Hoeffding Trees
4. Bischl, B., Schiffner, J., & Weihs, C. (2012), Benchmarking Classification Algorithms on High-Performance Computing Clusters. In M. Spiliopoulou, L. Schmidt Thieme & R. Jannings (Eds.), Data Analysis, Machine Learning and Knowledge Discovery Studies in Classification, Data Analysis and Knowledge Organization (pp. 23-31). Springer.
5. CRITICAL QUESTIONS FOR BIG DATA