QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation-Reference-Cited by-同舟云学术

QAScore—An Unsupervised Unreferenced Metric for the Question Generation Evaluation

Published:2022-10-24 Issue:11 Volume:24 Page:1514
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Ji Tianbo^ORCID,Lyu Chenyang,Jones Gareth,Zhou Liting,Graham Yvette

Abstract

Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage. In recent years, the introduction of neural generation models has resulted in substantial improvements of automatically generated questions in terms of quality, especially compared to traditional approaches that employ manually crafted heuristics. However, current QG evaluation metrics solely rely on the comparison between the generated questions and references, ignoring the passages or answers. Meanwhile, these metrics are generally criticized because of their low agreement with human judgement. We therefore propose a new reference-free evaluation metric called QAScore, which is capable of providing a better mechanism for evaluating QG systems. QAScore evaluates a question by computing the cross entropy according to the probability that the language model can correctly generate the masked words in the answer to that question. Compared to existing metrics such as BLEU and BERTScore, QAScore can obtain a stronger correlation with human judgement according to our human evaluation experiment, meaning that applying QAScore in the QG task benefits to a higher level of evaluation accuracy.

Funder

SFI Research Centres Programme

Science Foundation Ireland through the SFI Centre for Research Training in Machine Learning

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/24/11/1514/pdf

Reference54 articles.

1. Du, X., Shao, J., and Cardie, C. Learning to Ask: Neural Question Generation for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017.

2. Xie, Y., Pan, L., Wang, D., Kan, M.Y., and Feng, Y. Exploring Question-Specific Rewards for Generating Deep Questions. Proceedings of the 28th International Conference on Computational Linguistics, 2020.