EHMMQA: English, Hindi, and Marathi multilingual question answering framework using deep learning-Reference-Cited by-同舟云学术

EHMMQA: English, Hindi, and Marathi multilingual question answering framework using deep learning

Published:2024-05-24 Issue: Volume: Page:1-29
ISSN:2977-0424
Container-title:Natural Language Processing
language:en
Short-container-title:Nat. lang. processing

Author:

Lahoti Pawan^ORCID,Mittal Namita,Singh Girdhari

Abstract

Abstract Multilingual question answering (MQA) is an effective access to multilingual data to provide accurate and precise answers, irrespective of language. Although a wide range of datasets is available for monolingual QA systems in natural language processing, benchmark datasets specifically designed for MQA are considerably limited. The absence of comprehensive and benchmark datasets hinders the development and evaluation of MQA systems. To overcome this issue, the proposed work attempts to develop the EHMQuAD dataset, an MQA dataset for low-resource languages such as Hindi and Marathi accompanying the English language. The EHMQuAD dataset is developed using a synthetic corpora generation approach, and an alignment is performed after translation to make the dataset more accurate. Further, the EHMMQA model is proposed to create an abstract framework that uses a deep neural network that accepts pairs of questions and context and returns an accurate answer based on those questions. The shared question and shared context representation have been designed separately to develop this system. The experiments of the proposed model are conducted on the MMQA, Translated SQuAD, XQuAD, MLQA, and EHMQuAD datasets, and EM and F1-score are used as performance measures. The proposed model (EHMMQA) is collated with state-of-the-art MQA baseline models for all possible monolingual and multilingual settings. The results signify that EHMMQA is a considerable step toward the MQA system utilizing Hindi and Marathi languages. Hence, it becomes a new state-of-the-art model for Hindi and Marathi languages.

Publisher

Cambridge University Press (CUP)

Reference46 articles.

1. Architecture and evaluation of BRUJA, a multilingual question answering system

2. Neural Machine Translation for the Bangla-English Language Pair

3. Conneau, A. and Lample, G. (2019). Cross-lingual language model pretraining. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 7059–7069.

4. A Survey of Web Information Extraction Systems

5. Jointly Extracting Explicit and Implicit Relational Triples with Reasoning Pattern Enhanced Binary Pointer Network