Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language Pair-Reference-Cited by-同舟云学术

Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language Pair

Published:2022-03-31 Issue:2 Volume:21 Page:1-16
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Fatima Ghazeefa¹,Nawab Rao Muhammad Adeel¹,Khan Muhammad Salman¹,Saeed Ali²

Affiliation:

1. Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, Punjab, Pakistan

2. Department of Software Engineering, The University of Lahore, Lahore, Punjab, Pakistan

Abstract

Semantic word similarity is a quantitative measure of how much two words are contextually similar. Evaluation of semantic word similarity models requires a benchmark corpus. However, despite the millions of speakers and the large digital text of the Urdu language on the Internet, there is a lack of benchmark corpus for the Cross-lingual Semantic Word Similarity task for the Urdu language. This article reports our efforts in developing such a corpus. The newly developed corpus is based on the SemEval-2017 task 2 English dataset, and it contains 1,945 cross-lingual English–Urdu word pairs. For each of these pairs of words, semantic similarity scores were assigned by 11 native Urdu speakers. In addition to corpus generation, this article also reports the evaluation results of a baseline approach, namely “Translation Plus Monolingual Analysis” for automated identification of semantic similarity between English–Urdu word pairs. The results showed that the path length similarity measure performs better for the Google and Bing translated words. The newly created corpus and evaluation results are freely available online for further research and development.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3472618

Reference49 articles.

1. Scalable Cross-lingual Document Similarity through Language-specific Concept Hierarchies

2. Tomáš Brychcín. (2018). Linear transformations for cross-lingual semantic textual similarity. Knowledge-Based Systems arXiv preprint arXiv:1807.04172. Retrieved from https://arxiv.org/abs/1807.04172

3. Tomáš Brychcín Stephen Taylor and Lukáš Svoboda. 2019. Cross-lingual word analogies using linear transformations between semantic spaces. Expert Systems with Applications 135 (2019) 287–295.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Construction of English Semantic Analysis System Based on AI Technology;2024 Second International Conference on Data Science and Information System (ICDSIS);2024-05-17

2. Numerical Analysis and Optimization of English Reading Corpus for Feature Extraction;Wireless Communications and Mobile Computing;2022-09-06