Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm-Reference-Cited by-同舟云学术

Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm

Published:2019-03-20 Issue:2 Volume:23 Page:362-365
ISSN:1883-8014
Container-title:Journal of Advanced Computational Intelligence and Intelligent Informatics
language:en
Short-container-title:JACIII

Author:

Luo Nan-Chao,

Abstract

The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming.

Publisher

Fuji Technology Press Ltd.

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Human-Computer Interaction

Reference8 articles.

1. K.-W. Pang and H.-L. Chan, “Data mining-based algorithm for storage location assignment in a randomised warehouse,” Int. J. of Production Research, Vol.55, No.14, pp. 4035-4052, 2016.

2. E. R. Faria et al., “Minas: multiclass learning algorithm for novelty detection in data streams,” Data Mining and Knowledge Discovery, Vol.30, No.3, pp. 640-680, 2016.

3. G. Yang, Y. Zhang, J. Yang, et al., “Automated classification of brain images using wavelet-energy and biogeography-based optimization,” Multimedia Tools & Applications, Vol.75, No.23, pp. 15601-15617, 2016.

4. D.-S. Pan, “Research on Fuzzy Mining Algorithm for Massive Text Data Under Uncertain Noise,” Microelectronics & Computer, Vol.34, No.9, pp. 129-132, 2017.

5. K. Arasawa and S. Hattori, “Automatic Baseball Video Tagging Based on Voice Pattern Prioritization and Recursive Model Localization,” J. Adv. Comput. Intell. Intell. Inform., Vol.21, No.7, pp. 1262-1279, 2017.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Categorical Data Clustering: A Bibliometric Analysis and Taxonomy;Machine Learning and Knowledge Extraction;2024-05-07

2. Towards a Long-Term Sino-US Relationship: Implications and Potential Solutions of the Future Chinese Media Strategy for Global Speech Power;Open Journal of Political Science;2023

3. Web-Questionnaire-Based Corpus Creation Under Assumption of Human as Speech Targets;Journal of Advanced Computational Intelligence and Intelligent Informatics;2022-07-20

4. Applying Text Mining, Clustering Analysis, and Latent Dirichlet Allocation Techniques for Topic Classification of Environmental Education Journals;Sustainability;2021-09-29

5. Maximum-expectation integrated agglomerative nesting data mining model for cultural datasets;Personal and Ubiquitous Computing;2019-07-02