Improved Arabic Query Expansion using Word Embedding-Reference-Cited by-同舟云学术

Improved Arabic Query Expansion using Word Embedding

Published:2024-03-27 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Al-Lahham Yaser¹

Affiliation:

1. Zarqa University

Abstract

Abstract Word embedding enhances pseudo-relevance feedback query expansion (PRFQE), but training word embedding models needs a long time and is applied on large-size datasets. Moreover, training embedding models need special processing for languages with rich vocabulary and complex morphological structures, such as Arabic. This paper proposes using a representative subset of a dataset to train such models and defines the conditions of representativeness. Using a suitable subset of words to train a word embedding model is effective since it dramatically decreases the training time while preserving the retrieval efficiency. This paper shows that the subset of words that have the prefix ‘AL,’ or the AL-Definite words, represent the TREC2001/2022 dataset, and, for example, the time needed to train the SkipGram word embedding model by the AL-Definite words of this dataset becomes 10% of the time the whole dataset needs. The trained models are used to embed words for different scenarios of Arabic query expansion, and the proposed training method shows effectiveness as it outperforms the ordinary PRFQE by at least 7% Mean Average Precision (MAP) and 14.5% precision improvement at the 10th returned document (P10). Moreover, the improvement over not using the query expansion is 21.7% for MAP and 21.32% for the P10. The results show no significant differences between using different word embedding models for Arabic query expansion.

Publisher

Research Square Platform LLC

Reference37 articles.

1. A hybrid semantic query expansion approach for Arabic information retrieval;ALMarwi H;J Big Data,2020

2. Survey of Automatic Query Expansion for Arabic Text Retrieval;Farhan YH;J Inform Sci Theory Pract,2020

3. Improving Arabic information retrieval using word embedding similarities;Mahdaouy A;Int J Speech Technol

4. Xu J, Croft WB (1996) Query expansion using local and global document analysis, SIGIR Forum (ACM Special Interest Group on Information Retrieval), pp. 4–11, 10.1145/243199.243202

5. Roy D, Ganguly D, Bhatia S, Bedathur S, Mitra M (2018) Using word embeddings for information retrieval: How collection and term normalization choices affect performance, International Conference on Information and Knowledge Management, Proceedings, pp. 1835–1838, 10.1145/3269206.3269277