Topical and Non-Topical Approaches to Measure Similarity between Arabic Questions-Reference-Cited by-同舟云学术

Topical and Non-Topical Approaches to Measure Similarity between Arabic Questions

Published:2022-08-22 Issue:3 Volume:6 Page:87
ISSN:2504-2289
Container-title:Big Data and Cognitive Computing
language:en
Short-container-title:BDCC

Author:

Daoud Mohammad^ORCID

Abstract

Questions are crucial expressions in any language. Many Natural Language Processing (NLP) or Natural Language Understanding (NLU) applications, such as question-answering computer systems, automatic chatting apps (chatbots), digital virtual assistants, and opinion mining, can benefit from accurately identifying similar questions in an effective manner. We detail methods for identifying similarities between Arabic questions that have been posted online by Internet users and organizations. Our novel approach uses a non-topical rule-based methodology and topical information (textual similarity, lexical similarity, and semantic similarity) to determine if a pair of Arabic questions are similarly paraphrased. Our method counts the lexical and linguistic distances between each question. Additionally, it identifies questions in accordance with their format and scope using expert hypotheses (rules) that have been experimentally shown to be useful and practical. Even if there is a high degree of lexical similarity between a When question (Timex Factoid—inquiring about time) and a Who inquiry (Enamex Factoid—asking about a named entity), they will not be similar. In an experiment using 2200 question pairs, our method attained an accuracy of 0.85, which is remarkable given the simplicity of the solution and the fact that we did not employ any language models or word embedding. In order to cover common Arabic queries presented by Arabic Internet users, we gathered the questions from various online forums and resources. In this study, we describe a unique method for detecting question similarity that does not require intensive processing, a sizable linguistic corpus, or a costly semantic repository. Because there are not many rich Arabic textual resources, this is especially important for informal Arabic text processing on the Internet.

Publisher

MDPI AG

Subject

Artificial Intelligence,Computer Science Applications,Information Systems,Management Information Systems

Link

https://www.mdpi.com/2504-2289/6/3/87/pdf

Reference60 articles.

1. A survey on similarity measures in text mining;Vijaymeena;Mach. Learn. Appl. Int. J.,2016

2. CASONTO

3. From word embeddings to document similarities for improved information retrieval in software engineering;Ye;Proceedings of the 38th International Conference on Software Engineering,2016

4. Beyond BLEU: Training Neural Machine Translation with Semantic Similarity;Wieting;Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019,2019

5. A survey of text clustering algorithms;Aggarwal,2012

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Mirror to Human Question Asking: Analyzing the Akinator Online Question Game;Big Data and Cognitive Computing;2023-01-29