Large Scale Data Using K-Means-Reference-Cited by-同舟云学术

Large Scale Data Using K-Means

Published:2023-02-13 Issue: Volume: Page:36-45
ISSN:
Container-title:Mesopotamian Journal of Big Data
language:
Short-container-title:MJBD

Author:

zaib Raheela¹^ORCID,Ourabah OURLIS²^ORCID

Affiliation:

1. Punjab university gujranwala campus, Pakistan

2. University of Batna, Batna City, Algeria

Abstract

Because of the exponential growth of high-layered datasets, conventional database querying strategies are inadequate for extracting useful information, and analysts must now devise novel techniques to meet these demands. Such massive articulation data results in a plethora of new computational triggers as a result of both the rise in data protests and the increase of elements/ascribes. Preprocessing the data with a reliable dimensionality reduction method improves the efficacy and precision of mining operations on densely layered data. Therefore, we have compiled the opinions of numerous academics. Cluster analysis is a data analysis tool that has recently acquired prominence in a number of different disciplines. K-means, a common parceling-based clustering algorithm, looks for a fixed number of clusters that can be identified using only their centroids. However, the outcomes depend heavily on the starting points of the clusters' focuses. Again, there is a dramatic rise in the number of distance calculations with increasing data complexity. This is due to the fact that assembling a detailed model typically calls for a substantial and distributed amount of preliminary data. There may be a substantial time commitment involved in preparing a broad collection of ingredients. For huge data sets in particular, there is a cost/benefit analysis to consider when deciding how to create orders: speed vs. accuracy. The k-means method is commonly used to compress and sum vector data, as well as cluster it. For precautious k-means (ASB K-means), we present No Concurrent Specific Clumped K-means, a fast and memory-effective GPU-based method. Our method can be adjusted to use much less GPU RAM than the size of the full dataset, which is a significant improvement over earlier GPU-based k-means methods. Datasets that are too large to fit in RAM may be clustered. The approach uses a clustered architecture and applies the triangle disparity in each k-means focus to remove a data point if its enrollment task or cluster it belongs to remains unaltered, allowing it to efficiently handle big datasets. This reduces the number of data guides that must be transferred between the CPU's Slam and the GPU's global memory.

Publisher

Mesopotamian Academic Press

Subject

General Medicine,General Earth and Planetary Sciences,General Environmental Science

Reference20 articles.

1. [1] S. Bettoumi, C. Jlassi, and N. Arous, “Comparative Study of k-means Variants for mono-view clustering,” in International Conference for Signal and Image Processing -ATSIP, 2016, pp. 183–188.

2. [2] A. Alsayat, “Social Media Analysis using Optimized K-Means Clustering,” in IEEE 14th International Conference on Software Engineering Research, Management and Applications (SERA), 2016.

3. [3] Belal M. and Daoud A., 2005. A new algorithm for cluster initialization, World Academy of Science, Engineering and Technology, Vol. 4, pp. 74-76.

4. [4] Deelers S. and Auwatanamongkol S., 2007. Enhancing K-means algorithm with initial cluster centers derived from data partitioning along the data axis with the highest variance, International Journal of Computer Science, Vol. 2, No. 4, pp. 247- 252

5. [5] Valarmathie P., Srinath M. and Dinakaran K., 2009. An increased performance of clustering high dimensional data through dimensionality reduction technique, Journal of Theoretical and Applied Information Technology, Vol. 13, pp. 271-273

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. PILL: Plug into LLM with adapter expert and attention gate;Applied Soft Computing;2024-11

2. Statistical and density-based clustering of geographical flows for crowd movement patterns recognition;Applied Soft Computing;2024-09

3. Keyframe control for customizable choreography with style maintenance;Computers and Electrical Engineering;2024-07

4. DATE: a video dataset and benchmark for dynamic hand gesture recognition;Neural Computing and Applications;2024-06-07

5. Impacts of digitization on operational efficiency in the banking sector: Thematic analysis and research agenda proposal;International Journal of Information Management Data Insights;2024-04