Author:
Karra Rachid,Lasfar Abdelali
Abstract
Data quality has gained increasing attention across various research domains, including pattern recognition, image processing, and Natural Language Processing (NLP). The goal of this paper is to explore the impact of data quality (both questions and context) on Question-Answering (QA) system performance. We introduced an approach to enhance the results of the QA system through context simplification. The strength of our methodology resides in the utilization of human-scale NLP models. This approach promotes the utilization of multiple specialized models within the workflow to enhance the QA system’s outcomes, rather than relying solely on resource-intensive Large Language Model (LLM). We demonstrated that this method improves the correct response rate of the QA system without modification or additional training of the model. In addition, we conducted a cross-disciplinary study involving NLP and linguistics. We analyzed QA system results to showcase their correlation with readability and text complexity linguistic metrics using Coh-Metrix. Lastly, we explore the robustness of Bidirectional Encoder Representations from Transformers (BERT) and Reliable National Entrance Test (R-NET) models when confronted with noisy questions.