Affiliation:
1. Punjab university gujranwala campus, Pakistan
2. University of Batna, Batna City, Algeria
Abstract
Because of the exponential growth of high-layered datasets, conventional database querying strategies are inadequate for extracting useful information, and analysts must now devise novel techniques to meet these demands. Such massive articulation data results in a plethora of new computational triggers as a result of both the rise in data protests and the increase of elements/ascribes. Preprocessing the data with a reliable dimensionality reduction method improves the efficacy and precision of mining operations on densely layered data. Therefore, we have compiled the opinions of numerous academics. Cluster analysis is a data analysis tool that has recently acquired prominence in a number of different disciplines. K-means, a common parceling-based clustering algorithm, looks for a fixed number of clusters that can be identified using only their centroids. However, the outcomes depend heavily on the starting points of the clusters' focuses. Again, there is a dramatic rise in the number of distance calculations with increasing data complexity. This is due to the fact that assembling a detailed model typically calls for a substantial and distributed amount of preliminary data. There may be a substantial time commitment involved in preparing a broad collection of ingredients. For huge data sets in particular, there is a cost/benefit analysis to consider when deciding how to create orders: speed vs. accuracy. The k-means method is commonly used to compress and sum vector data, as well as cluster it. For precautious k-means (ASB K-means), we present No Concurrent Specific Clumped K-means, a fast and memory-effective GPU-based method. Our method can be adjusted to use much less GPU RAM than the size of the full dataset, which is a significant improvement over earlier GPU-based k-means methods. Datasets that are too large to fit in RAM may be clustered. The approach uses a clustered architecture and applies the triangle disparity in each k-means focus to remove a data point if its enrollment task or cluster it belongs to remains unaltered, allowing it to efficiently handle big datasets. This reduces the number of data guides that must be transferred between the CPU's Slam and the GPU's global memory.
Publisher
Mesopotamian Academic Press
Subject
General Medicine,General Earth and Planetary Sciences,General Environmental Science
Reference20 articles.
1. [1] S. Bettoumi, C. Jlassi, and N. Arous, “Comparative Study of k-means Variants for mono-view clustering,” in International Conference for Signal and Image Processing -ATSIP, 2016, pp. 183–188.
2. [2] A. Alsayat, “Social Media Analysis using Optimized K-Means Clustering,” in IEEE 14th International Conference on Software Engineering Research, Management and Applications (SERA), 2016.
3. [3] Belal M. and Daoud A., 2005. A new algorithm for cluster initialization, World Academy of Science, Engineering and Technology, Vol. 4, pp. 74-76.
4. [4] Deelers S. and Auwatanamongkol S., 2007. Enhancing K-means algorithm with initial cluster centers derived from data partitioning along the data axis with the highest variance, International Journal of Computer Science, Vol. 2, No. 4, pp. 247- 252
5. [5] Valarmathie P., Srinath M. and Dinakaran K., 2009. An increased performance of clustering high dimensional data through dimensionality reduction technique, Journal of Theoretical and Applied Information Technology, Vol. 13, pp. 271-273
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献