Algorithm for Optimization of Keyword Extraction Based on the Application of a Linguistic Parser-Reference-Cited by-同舟云学术

Algorithm for Optimization of Keyword Extraction Based on the Application of a Linguistic Parser

Published:2024-03-28 Issue:2 Volume:23 Page:467-494
ISSN:2713-3206
Container-title:Informatics and Automation
language:
Short-container-title:IA

Author:

Kravchenko Daniil,Kravchenko Yury,Mansour Ali Mahmoud,Mohammad Juman,Pavlov Nikolai

Abstract

This article presents an analytical comparison between constituency parsing and dependency parsing – two types of parsing used in the field of natural language processing (NLP). The study introduces an algorithm to enhance keyword extraction, employing the noun phrase extraction feature of the parser to filter out unsuitable phrases. This algorithm is implemented using three different parsers: Spacy, AllenNLP and Stazna. The effectiveness of this algorithm was compared with two popular methods (Yake, Rake) on a dataset of English texts. Experimental results show that the proposed algorithm with the SpaCy parser is superior to other keyword extraction algorithms in terms of accuracy and speed. For the AllenNLP and Stanza parsers, our algorithm is also more accurate, but requires much longer execution time. The results obtained allow us to evaluate in more detail the advantages and disadvantages of the parsers studied in the work, as well as to determine directions for further research. The running time of the SpaCy parser is significantly less than the other two parsers because parsers that use transitions for deterministic or machine-learned set of actions to build the dependency tree step by step. They are typically faster and require less memory than graph-based parsers, making them more efficient for parsing large amounts of text. On the other hand, AllenNLP and Stanza use graph-based parsing models that rely on millions of features, which limits their ability to generalize and slows down the speed of analysis compared to transition-based parsers. The task of achieving a balance between the accuracy and speed of a linguistic parser is an open topic that requires further research due to the importance of this problem for improving the efficiency of text analysis, especially in applications that require real-time accuracy. To this end, the authors plan to conduct further research into possible solutions to achieve this balance.

Publisher

SPIIRAS

Reference26 articles.

1. Brown T., Mann B., Ryder N., Subbiah M., Kaplan J.D., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., et al. Language models are few-shot learners // Advances in neural information processing systems. 2020. vol. 33. pp. 1877–1901.

2. Zhang Y., Clark S. A tale of two parsers: Investigating and combining graph-based and transition-based dependency parsing // Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. 2008. pp. 562–571.

3. Gao L., Madaan A., Zhou S., Alon U., Liu P., Yang Y., Callan J., Neubig G. Pal: Program aided language models. 2023. pp. 10764–10799.

4. Kravchenko Yu.A., Bova V.V., Kuliev E.V., Rodzin S.I. Simulation of the semantic network of knowledge representation in intelligent assistant systems based on ontological approach // Futuristic Trends in Network and Communication Technologies: Third International Conference, FTNCT. 2021. pp. 241–252.

5. Chen D., Manning C.D. A fast and accurate dependency parser using neural networks // Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. pp. 740–750.