Affiliation:
1. School of Marxism, Shangqiu Institute of Technology, Shangqiu 476000, China
Abstract
There are some problems in automatic keyword extraction of Chinese text, such as large feature extraction error, low precision of extracted keywords, and poor real-time performance. Therefore, an automatic keyword extraction algorithm for Chinese text based on word clustering is designed. Calculate keyword frequency, document frequency and inverse document frequency features through statistical algorithm, measure the degree of interdependence between keywords with the help of point mutual information, and construct keyword feature item quantification matrix with the help of vector space model corresponding to keywords and feature items to complete keyword feature quantification and realize keyword feature extraction of Chinese text. Calculate the average semantic similarity of keyword words, determine the similarity of keyword features, and eliminate the keyword features with high similarity; Set the comprehensive feature value of the importance of single word words in Chinese text, determine the importance of single word words in the text, remove the single word words with low importance, and use Bayesian framework to reduce the dimension of high-dimensional keyword feature data to realize preprocessing research. The mapping results of keyword vector space model are determined by word clustering algorithm, the text clusters of keyword space clustering results are calculated by clustering algorithm, and the keywords are classified by DBN method. On this basis, the automatic keyword extraction model of Chinese text is designed to realize the automatic keyword extraction of Chinese text. The experimental results show that the design algorithm can effectively reduce the feature extraction error and improve the extraction efficiency.
Publisher
Association for Computing Machinery (ACM)
Reference16 articles.
1. YAKE! Keyword extraction from single documents using multiple local features
2. Big data anomaly extraction algorithm based on uncorrelation test;Chen Y. Y.;Computer Simulation,2021
3. Chinese keyword extraction model with distributed computing;Ding T.;Computers & Electrical Engineering,2022
4. A message keyword extraction approach by accurate identification of field boundaries
5. Guo W J , Bao X A . ( 2021 ), Key information extraction model of unstructured text in knowledge database Computer simulation, 38 (9): 326-330 . Guo W J, Bao X A. (2021), Key information extraction model of unstructured text in knowledge database Computer simulation, 38 (9): 326-330.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献