Improving the K-Means Clustering Algorithm Oriented to Big Data Environments-Reference-Cited by-同舟云学术

Improving the K-Means Clustering Algorithm Oriented to Big Data Environments

Published:2021 Issue: Volume: Page:289-308
ISSN:2327-0411
Container-title:Handbook of Research on Natural Language Processing and Smart Service Systems
language:
Short-container-title:

Author:

Pérez Ortega Joaquín¹,Almanza Ortega Nelva Nely²,Vega Villalobos Andrea¹,Aguirre L. Marco A.³^ORCID,Zavala Díaz Crispín⁴,Ortiz Hernandez Javier¹^ORCID,Hernández Gómez Antonio¹

Affiliation:

1. Tecnológico Nacional de México, Mexico & CENIDET, Mexico

2. Tecnológico Nacional de México, Mexico & Instituto Tecnológico de Tlalnepantla, Mexico

3. Tecnológico Nacional de México, Mexico & Instituto Tecnológico de Ciudad Madero, Mexico

4. Universidad Autónoma del Estado de Morelos, Mexico

Abstract

In recent years, the amount of texts in natural language, in digital format, has had an impressive increase. To obtain useful information from a large volume of data, new specialized techniques and efficient algorithms are required. Text mining consists of extracting meaningful patterns from texts; one of the basic approaches is clustering. The most used clustering algorithm is k-means. This chapter proposes an improvement of the k-means algorithm in the convergence step; the process stops whenever the number of objects that change their assigned cluster in the current iteration is bigger than the ones that changed in the previous iteration. Experimental results showed a reduction in execution time up to 93%. It is remarkable that, in general, better results are obtained when the volume of the text increase, particularly in those texts within big data environments.

Publisher

IGI Global

Reference52 articles.

1. An extractive text summarization technique for Bengali document(s) using K-means clustering algorithm

2. Al-Azzawy, D. S., & Al-Rufaye, F. M. L. (2017). Arabic words clustering by using K-means algorithm. In 2017Annual Conference on New Trends in Information & Communications Technology Applications (NTICT) (pp. 263-267). IEEE.

3. Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., & Herrera, F. (2011). Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing, 17.

4. Revisiting K-Means and Topic Modeling, a Comparison Study to Cluster Arabic Documents

5. A New Hybrid Model of K-Means and Naïve Bayes Algorithms for Feature Selection in Text Documents Categorization.;A.Allahverdipour;Journal of Advances in Computer Research,2017