Automatic Keyword Extraction Algorithm for Chinese Text based on Word Clustering

Author:

Pan Rui1ORCID

Affiliation:

1. School of Marxism, Shangqiu Institute of Technology, Shangqiu 476000, China

Abstract

There are some problems in automatic keyword extraction of Chinese text, such as large feature extraction error, low precision of extracted keywords, and poor real-time performance. Therefore, an automatic keyword extraction algorithm for Chinese text based on word clustering is designed. Calculate keyword frequency, document frequency and inverse document frequency features through statistical algorithm, measure the degree of interdependence between keywords with the help of point mutual information, and construct keyword feature item quantification matrix with the help of vector space model corresponding to keywords and feature items to complete keyword feature quantification and realize keyword feature extraction of Chinese text. Calculate the average semantic similarity of keyword words, determine the similarity of keyword features, and eliminate the keyword features with high similarity; Set the comprehensive feature value of the importance of single word words in Chinese text, determine the importance of single word words in the text, remove the single word words with low importance, and use Bayesian framework to reduce the dimension of high-dimensional keyword feature data to realize preprocessing research. The mapping results of keyword vector space model are determined by word clustering algorithm, the text clusters of keyword space clustering results are calculated by clustering algorithm, and the keywords are classified by DBN method. On this basis, the automatic keyword extraction model of Chinese text is designed to realize the automatic keyword extraction of Chinese text. The experimental results show that the design algorithm can effectively reduce the feature extraction error and improve the extraction efficiency.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference16 articles.

1. YAKE! Keyword extraction from single documents using multiple local features

2. Big data anomaly extraction algorithm based on uncorrelation test;Chen Y. Y.;Computer Simulation,2021

3. Chinese keyword extraction model with distributed computing;Ding T.;Computers & Electrical Engineering,2022

4. A message keyword extraction approach by accurate identification of field boundaries

5. Guo W J , Bao X A . ( 2021 ), Key information extraction model of unstructured text in knowledge database Computer simulation, 38 (9): 326-330 . Guo W J, Bao X A. (2021), Key information extraction model of unstructured text in knowledge database Computer simulation, 38 (9): 326-330.

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Richer Vocabulary of Chinese Personality Traits: Leveraging Word Embedding Technology for Mining Personality Descriptors;Journal of Psycholinguistic Research;2024-03-25

2. Keyword Extraction from Scientific Publications Using Local Features and Embedding Model;2023 9th International Conference on Signal Processing and Intelligent Systems (ICSPIS);2023-12-14

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3