SummEval: Re-evaluating Summarization Evaluation-Reference-Cited by-同舟云学术

SummEval: Re-evaluating Summarization Evaluation

Published:2021 Issue: Volume:9 Page:391-409
ISSN:2307-387X
Container-title:Transactions of the Association for Computational Linguistics
language:en
Short-container-title:

Author:

Fabbri Alexander R.¹,Kryściński Wojciech²,McCann Bryan³,Xiong Caiming⁴,Socher Richard⁵,Radev Dragomir⁶⁷

Affiliation:

1. Yale University, United States. alexander.fabbri@yale.edu

2. Salesforce Research, United States. kryscinski@salesforce.com

3. Salesforce Research, United States. bryan.mccann.is@gmail.com

4. Salesforce Research, United States. cxiong@salesforce.com

5. Salesforce Research, United States. richard@socher.org

6. Yale University, United States

7. Salesforce Research, United States. dragomir.radev@yale.edu

Abstract

Abstract The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization model outputs along with expert and crowd-sourced human annotations; 2) we consistently benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics; 3) we assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format; 4) we implement and share a toolkit that provides an extensible and unified API for evaluating summarization models across a broad range of automatic metrics; and 5) we assemble and share the largest and most diverse, in terms of model types, collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset annotated by both expert judges and crowd-source workers. We hope that this work will help promote a more complete evaluation protocol for text summarization as well as advance research in developing evaluation metrics that better correlate with human judgments.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Human-Computer Interaction,Communication

Link

http://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00373/1923949/tacl_a_00373.pdf

Reference79 articles.

1. Neural machine translation by jointly learning to align and translate;Bahdanau;arXiv preprint arXiv:1409.0473,2014

2. Better rewards yield better summaries: Learning to summarise without references;Böhm,2019

3. STRASS: A light and effective method for extractive summarization based on sentence embeddings;Bouscarrat,2019

4. The price of debiasing automatic metrics in natural language evaluation;Chaganty,2018

5. Fast abstractive summarization with reinforce-selected sentence rewriting;Chen,2018

Cited by 107 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Factual consistency evaluation of summarization in the Era of large language models;Expert Systems with Applications;2024-11

2. KEMoS: A knowledge-enhanced multi-modal summarizing framework for Chinese online meetings;Neural Networks;2024-10

3. Closing the gap between open source and commercial large language models for medical evidence summarization;npj Digital Medicine;2024-09-09

4. Summarizing long scientific documents through hierarchical structure extraction;Natural Language Processing Journal;2024-09

5. Evaluation metrics on text summarization: comprehensive survey;Knowledge and Information Systems;2024-08-31