MTAS: A Reference-Free Approach for Evaluating Abstractive Summarization Systems

Author:

Zhu Xiaoyan1ORCID,Jiang Mingyue1ORCID,Zhang Xiao-Yi2ORCID,Nie Liming3ORCID,Ding Zuohua1ORCID

Affiliation:

1. Zhejiang Sci-Tech University, Hangzhou, China

2. University of Science and Technology Beijing, Beijing, China

3. Shenzhen Technology University, Shenzhen, China

Abstract

Abstractive summarization (AS) systems, which aim to generate a text for summarizing crucial information of the original document, have been widely adopted in recent years. Unfortunately, factually unreliable summaries may still occur, leading to unexpected misunderstanding and distortion of information. This calls for methods that can properly evaluate the quality of AS systems. Yet, the existing reference-based evaluation approach for AS relies on reference summaries as well as automatic evaluation metrics (e.g., ROUGE). Therefore, the reference-based evaluation approach is highly restricted by the availability and quality of reference summaries as well as the capability of existing automatic evaluation metrics. In this study, we propose MTAS, a novel metamorphic testing based approach for evaluating AS in a reference-free way. Our two major contributions are (i) five metamorphic relations towards AS, which involve semantic-preserving and focus-preserving transformations at the document level, and (ii) a summary consistency evaluation metric SCY, which measures the alignment between a pair of summaries by incorporating both the semantic and factual consistency. Our experimental results show that the proposed metric SCY has a significantly higher correlation with human judgment as compared to a set of existing metrics. It is also demonstrated that MTAS can break the dependence on reference summaries, and it successfully reports a large number of summary inconsistencies, revealing various summarization issues on state-of-the-art AS systems.

Publisher

Association for Computing Machinery (ACM)

Reference53 articles.

1. Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, and Weiwei Guo. 2013. * SEM 2013 shared task: Semantic textual similarity. In Second joint conference on lexical and computational semantics (* SEM), volume 1: proceedings of the Main conference and the shared task: semantic textual similarity. 32–43.

2. Biasfinder: Metamorphic test generation to uncover bias for sentiment analysis systems;Asyrofi Muhammad Hilmi;IEEE Transactions on Software Engineering,2021

3. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72.

4. Florian Böhm Yang Gao Christian M Meyer Ori Shapira Ido Dagan and Iryna Gurevych. 2019. Better rewards yield better summaries: Learning to summarise without references. arXiv preprint arXiv:1909.01214.

5. Rishi Bommasani and Claire Cardie. 2020. Intrinsic evaluation of summarization datasets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 8075–8096.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3