VLSP 2021-ViMRC Challenge: Vietnamese Machine Reading Comprehension

Author:

Nguyen Kiet,Tran Son Quoc,Nguyen Luan Thanh,Huynh Tin Van,Luu Son Thanh,Nguyen Ngan Luu-Thuy

Abstract

One of the emerging research trends in natural language understanding is machine reading comprehension (MRC) which is the task to find answers to human questions based on textual data. While many datasets have been developed for MRC research for other languages, there is a lack of such resources for the Vietnamese language. Although many datasets and methodologies have been developed for English and Chinese, many Vietnamese machine reading comprehension limitations need to be solved further. Existing Vietnamese datasets for MRC research concentrate solely on answerable questions. However, in reality, questions can be unanswerable for which the correct answer is not stated in the given textual data. To address the weakness, we provide the research community with a benchmark dataset named UIT-ViQuAD 2.0 for evaluating the MRC task and question answering systems for the Vietnamese language. We use UIT-ViQuAD 2.0 as a benchmark dataset for the shared task on Vietnamese machine reading comprehension (VLSP2021-MRC) at the Eighth Workshop on Vietnamese Language and Speech Processing (VLSP 2021). This task attracted 77 participant teams from 34 universities and other organizations. Each participant was provided with the training data, including 28,457 annotated question-answer pairs, and returned the result on a public test set of more than 3,821 questions and a private test set of 3,712 questions. In this article, we present details of the organization of the shared task, an overview of the methods employed by shared-task participants, and the results. The highest performances in this competition are 77.24% (in EM) and 67.43% (in F1-score) on the private test set. The Vietnamese MRC systems proposed by the top 3 teams use XLM-RoBERTa, a powerful pre-trained language model based on the transformer architecture that has achieved state-of-the-art results on many natural language processing tasks. We believe that releasing the UIT-ViQuAD 2.0 dataset motivates more researchers to improve Vietnamese machine reading comprehension.

Publisher

Vietnam National University Journal of Science

Subject

General Medicine

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. INTEGRATING IMAGE FEATURES WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE NETWORK FOR MULTILINGUAL VISUAL QUESTION ANSWERING;Journal of Computer Science and Cybernetics;2024-03-16

2. Improving Vietnamese Question-Answering system with Data Augmentation and Optimization;2023 RIVF International Conference on Computing and Communication Technologies (RIVF);2023-12-23

3. Building a Vietnamese Dataset for Natural Language Inference Models;SN Computer Science;2022-07-25

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3