Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment-Reference-Cited by-同舟云学术

Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment

Published:2019-07 Issue:3 Volume:10 Page:17-32
ISSN:1941-6237
Container-title:International Journal of Ambient Computing and Intelligence
language:en
Short-container-title:

Author:

Wang Rujuan¹,Wang Gang²

Affiliation:

1. College of Humanities & Sciences of Northeast Normal University, Changchun, China

2. Northeast Normal University, Changchun, China

Abstract

In the field of modern information technology, how to find information quickly, accurately and comprehensively that users really needed has become the focus of research in this field. In this article, a feature selection method based on a complex network is proposed for the structure and content characteristics of large-scale web text information. The preprocessed web text is converted into a complex network. The nodes in the network correspond to the entries in the text. The edges of the network correspond to the links between the entries in the text, and the degree of nodes and the aggregation system are used. Second, the text classification method is studied from the point of view of data sampling, and a text classification method based on density statistics is proposed. This method uses not only the density information of the text feature set in the classification process, but also the use of statistical merging criteria to get the text. The difference information of each feature has a better classification effect for large text collections.

Publisher

IGI Global

Subject

Software

Reference25 articles.

1. The small world of human language

2. Internet of Things and Big Data Analytics Toward Next-Generation Intelligence

3. Application of TF-IDF feature for categorizing documents of online bangla web text corpus;A.Dhar;Intelligent Engineering Informatics,2018

4. Survey of data mining for microblogs.;Z.Ding;Journal of Computer Research & Development,2014

5. A statistical interpretation of term specificity and its application in retrieval

Cited by 19 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An unsupervised linguistic-based model for automatic glossary term extraction from a single PDF textbook;Education and Information Technologies;2023-05-06

2. Predicting Violence-Induced Stress in an Arabic Social Media Forum;Intelligent Automation & Soft Computing;2023

3. The Prediction of Consumer Behavior from Social Media Activities;Behavioral Sciences;2022-08-12

4. Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding;PeerJ Computer Science;2022-07-07

5. Topic Modeling Techniques for Text Mining Over a Large-Scale Scientific and Biomedical Text Corpus;International Journal of Ambient Computing and Intelligence;2022-04-29