SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings-Reference-Cited by-同舟云学术

SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings

Published:2021-02-16 Issue: Volume:2021 Page:1-13
ISSN:1099-0526
Container-title:Complexity
language:en
Short-container-title:Complexity

Author:

Al-Matham Rawan N.¹^ORCID,Al-Khalifa Hend S.¹^ORCID

Affiliation:

1. Department of Information Technology, College of Computer and Information Sciences, King Saud University, P.O. Box 12371, Riyadh, Saudi Arabia

Abstract

Automatic synonym extraction plays an important role in many natural language processing systems, such as those involving information retrieval and question answering. Recently, research has focused on extracting semantic relations from word embeddings since they capture relatedness and similarity between words. However, using word embeddings alone poses problems for synonym extraction because it cannot determine whether the relation between words is synonymy or some other semantic relation. In this paper, we present a novel solution for this problem by proposing the SynoExtractor pipeline, which can be used to filter similar word embeddings to retain synonyms based on specified linguistic rules. Our experiments were conducted using KSUCCA and Gigaword embeddings and trained with CBOW and SG models. We evaluated automatically extracted synonyms by comparing them with Alma’any Arabic synonym thesauri. We also arranged for a manual evaluation by two Arabic linguists. The results of experiments we conducted show that using the SynoExtractor pipeline enhances the precision of synonym extraction compared to using the cosine similarity measure alone. SynoExtractor obtained a 0.605 mean average precision (MAP) for the King Saud University Corpus of Classical Arabic with 21% improvement over the baseline and a 0.748 MAP for the Gigaword corpus with 25% improvement. SynoExtractor outperformed the Sketch Engine thesaurus for synonym extraction by 32% in terms of MAP. Our work shows promising results for synonym extraction suggesting that our method can also be used with other languages.

Funder

King Saud University

Publisher

Hindawi Limited

Subject

Multidisciplinary,General Computer Science

Link

http://downloads.hindawi.com/journals/complexity/2021/6627434.pdf

Reference39 articles.

1. WordNet

2. Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering

3. Query Expansion for Myanmar Information Retrieval Used by WordNet

4. Method for automated thesaurus development in learning process support systems;I. A. Pisarev

5. Use of fuzzy logic and wordnet for improving performance of extractive automatic text summarization

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An effective deep learning based Idrcnn and Bdc-Lstm models for complex word identification and synonym generation;International Journal of Information Technology;2024-06-23

2. Text Data Augmentation Techniques for Word Embeddings in Fake News Classification;IEEE Access;2024

3. Review of research on synonym equivalence relation mining;2023 International Conference on Intelligent Education and Intelligent Research (IEIR);2023-11-05

4. A Comparative Study on Keyword Extraction and Generation of Synonyms in Natural Language Processing;2023 International Conference in Advances in Power, Signal, and Information Technology (APSIT);2023-06-09

5. Enhancing Lexicon Based Sentiment Analysis Using n-gram Approach;Engineering Cyber-Physical Systems and Critical Infrastructures;2023