Evaluating the performance of multilingual models in answer extraction and question generation

Author:

Moreno-Cediel Antonio,del-Hoyo-Gabaldon Jesus-Angel,Garcia-Lopez Eva,Garcia-Cabot Antonio,de-Fitero-Dominguez David

Abstract

AbstractMultiple-choice test generation is one of the most complex NLP problems, especially in languages other than English, where there is a lack of prior research. After a review of the literature, it has been verified that some methods like the usage of rule-based systems or primitive neural networks have led to the application of a recent architecture, the Transformer architecture, in the tasks of Answer Extraction (AE) and Question Generation (QG). Thereby, this study is centred in searching and developing better models for the AE and QG tasks in Spanish, using an answer-aware methodology. For this purpose, three multilingual models (mT5-base, mT0-base and BLOOMZ-560 M) have been fine-tuned using three different datasets: a translation to Spanish of the SQuAD dataset; SQAC, which is a dataset in Spanish; and their union (SQuAD + SQAC), which shows slightly better results. Regarding the models, the performance of mT5-base has been compared with that found in two newer models, mT0-base and BLOOMZ-560 M. These models were fine-tuned for multiple tasks in literature, including AE and QG, but, in general, the best results are obtained from the mT5 models trained in our study with the SQuAD + SQAC dataset. Nonetheless, some other good results are obtained from mT5 models trained only with the SQAC dataset. For their evaluation, the widely used BLEU1-4, METEOR and ROUGE-L metrics have been obtained, where mT5 outperforms some similar research works. Besides, CIDEr, SARI, GLEU, WER and the cosine similarity metrics have been calculated to present a benchmark within the AE and QG problems for future work.

Publisher

Springer Science and Business Media LLC

Reference72 articles.

1. Rus, V., Cai, Z., & Graesser, A. Question generation: Example of a multi-year evaluation campaign.In Proceedings of the WS on the QGSTEC (2008).

2. Wang, W., Feng, S., Wang, D., & Zhang, Y. Answer-guided and semantic coherent question generation in open-domain conversation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 5066–5076. https://doi.org/10.18653/v1/D19-1511.

3. Duan, N., Tang, D., Chen, P., & Zhou, M. Question generation for question answering. in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark: Association for Computational Linguistics, Sep. 2017, pp. 866–874. https://doi.org/10.18653/v1/D17-1090.

4. Rebuffel, C. et al., Data-QuestEval: A referenceless metric for data-to-text semantic evaluation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 8029–8036. https://doi.org/10.18653/v1/2021.emnlp-main.633.

5. Pan, L., Chen, W., Xiong, W., Kan, M.-Y., & Wang, W. Y. Zero-shot fact verification by claim generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Online: Association for Computational Linguistics, Aug. 2021, pp. 476–483. https://doi.org/10.18653/v1/2021.acl-short.61.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3