An Extractive Question Answering System for the Tamil Language-Reference-Cited by-同舟云学术

An Extractive Question Answering System for the Tamil Language

Published:2023-02-27 Issue: Volume: Page:
ISSN:1662-0356
Container-title:IoT, Cloud and Data Science
language:
Short-container-title:

Author:

Krishnan Aravind¹,Sriram Srinivasa Ramanujan¹,Ganesan Balaji Vishnu Raj¹,Sridhar S.¹

Affiliation:

1. SRM Institute of Science and Technology

Abstract

In the field of Natural Language Processing, Question Answering is a cardinal task that has garnered a lot of attention. With the development of multiple language models, question answering systems have been developed and deployed to facilitate enhanced information retrieval. These systems, however, have been implemented to a large extent only in English. Our objective was to create such a question answering system for the Tamil Language. We decided to use XLM-RoBERTa as our language model, which has been trained on a variety of datasets. We have also employed a hand-annotated dataset for the purpose of validation. We trained the model on two types of datasets, the first one being only in Tamil, whereas the other one being a mixture of Indian languages along with Tamil. The results were satisfactory in both cases. Given the huge amount of computational power the model required for training, we utilized the Colab Pro Plus cloud GPU from Google to satisfy our demands. We will also be publishing our dataset on huggingface so that fellow researchers can use it for further analysis.

Publisher

Trans Tech Publications Ltd

Link

https://www.scientific.net/AST.124.312.pdf

Reference15 articles.

1. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang, SQuAD: 100,000+ Questions for Machine Comprehension of Text,, arXiv:1606.05250, [cs], June (2016).

2. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, Attention Is All You Need,, arXiv:1706.03762, [cs], June (2017).

3. Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov, Unsupervised Cross-lingual Representation Learning at Scale,, arXiv:1911.02116, [cs], Nov. (2019).

4. Patrick Lewis, Barlas Oğuz, Ruty Rinott, Sebastian Riedel, Holger Schwenk, MLQA: Evaluating Cross-lingual Extractive Question Answering,, arXiv:1910.07475 [cs], Oct. (2018).

5. Mikel Artetxe, Sebastian Ruder, Dani Yogatama, On the cross-lingual transferability of monolingual representations, arXiv:1910.11856 [cs], Oct. (2019).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Context-Aware Auto-Encoded Graph Neural Model for Dynamic Question Generation using NLP;ACM Transactions on Asian and Low-Resource Language Information Processing;2023-10-05