Abstract
In the identification of network text information, the existing technology is difficult to accurately extract and classify text information with high propagation speed and high update speed. In order to solve this problem, the research combines the Naive Bayes algorithm with the feature two-dimensional information gain weighting method, uses the feature weighting method to optimize the Naive Bayes algorithm, and calculates the dimension of different documents and data categories through a new feature operation method. The data gain between them can improve its classification performance, and the classification models are compared and analyzed in the actual Chinese and English databases. The research results show that the classification accuracy rates of the IGDC-DWNB model in the Sogou database, 20-newsgroup database, Fudan database and Ruster21578 database are 0.89, 0.89, 0.93, and 0.88, respectively, which are higher than other classification models in the same environment. It can be seen that the model designed in the research has higher classification accuracy, stronger overall performance, and stronger reliability and robustness in practical applications, which can provide a new development idea for big data classification technology.
Subject
Computational Mathematics,Computer Science Applications,General Engineering
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献