Abstract
In this paper, we focused on cluster analysis as the most commonly used technique for grouping different objects. By clustering data, we can extract groups of similar objects from different collections. First, we defined Big Data and clustering to follow the rest of the paper. We presented the most popular techniques for clustering data, including partitioning, hierarchical clustering, density-based clustering, and network-based clustering. Big Data describes large amounts of data. High precision of big data can contribute to decision-making confidence, and better estimates can help increase efficiency, reduce costs, and risks. Various methods and approaches are used for data processing, including clustering, classification, regression, artificial intelligence, neural networks, association rules, decision trees, genetic algorithms, and the nearest neighbor method. A cluster represents a set of objects from the same class, which means that similar objects are grouped together, and different objects are grouped separately. We described the K-means algorithm, hierarchical clustering, density-based clustering -DBSCAN algorithm, and the STING algorithm for network-based clustering.
Publisher
Centre for Evaluation in Education and Science (CEON/CEES)
Reference20 articles.
1. A. K. Jain, R. C. Dubes. (1988) Algorithms for Clustering Data, Prentice-Hall, Inc., USA;
2. Aggarwal, Charu C.; and Yu, Philip S. (1988) A new framework for itemset generation, in PODS 98, Symposium on Principles of Database Systems, Seattle, WA, USA, pages - 24;
3. Brin, Sergey; Motwani, Rajeev; Ullman, Jeffrey D.; and Tsur, Shalom. (1997) Dynamic itemset counting and implication rules for market basket data, in SIGMOD, Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1997), Tucson, Arizona, USA, pp. 255-264;
4. Brown B, Sikes J, Willmott P. (2013) Bullish on digital: McKinsey Global Survey results, McKinsey. Quarterly, No. 12, pp. 1-8;
5. Dumbill E, (2012) "What is big data", An introduction to the big data landscape. Preuzeto sa sajta: www.mhsinformatics.org;