An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics-Reference-Cited by-同舟云学术

An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics

Published:2022-12-23 Issue:8 Volume:55 Page:1-35
ISSN:0360-0300
Container-title:ACM Computing Surveys
language:en
Short-container-title:ACM Comput. Surv.

Author:

Koh Huan Yee¹^ORCID,Ju Jiaxin¹^ORCID,Liu Ming²^ORCID,Pan Shirui¹^ORCID

Affiliation:

1. Monash University, Australia

2. Deakin University, Australia

Abstract

Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader’s comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation metrics. For each component, we organize the literature within the context of long document summarization and conduct an empirical analysis to broaden the perspective on current research progress. The empirical analysis includes a study on the intrinsic characteristics of benchmark datasets, a multi-dimensional analysis of summarization models, and a review of the summarization evaluation metrics. Based on the overall findings, we conclude by proposing possible directions for future exploration in this rapidly growing field.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3545176

Reference138 articles.

1. SciBERT: A Pretrained Language Model for Scientific Text

2. Iz Beltagy Matthew E. Peters and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv:2004.05150. Retrieved from https://arxiv.org/abs/2004.05150.

3. Manik Bhandari Pranav Gour Atabak Ashfaq Pengfei Liu and Graham Neubig. 2020. Re-evaluating evaluation in text summarization. arXiv:2010.07100. Retrieved from https://arxiv.org/abs/2010.07100.

4. Intrinsic Evaluation of Summarization Datasets

5. A Survey on NLP based Text Summarization for Summarizing Product Reviews

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Neural natural language processing for long texts: A survey on classification and summarization;Engineering Applications of Artificial Intelligence;2024-07

2. Graphs in clusters: a hybrid approach to unsupervised extractive long document summarization using language models;Artificial Intelligence Review;2024-06-29

3. Improving ROUGE‐1 by 6%: A novel multilingual transformer for abstractive news summarization;Concurrency and Computation: Practice and Experience;2024-06-10

4. Graph spatiotemporal process for multivariate time series anomaly detection with missing values;Information Fusion;2024-06

5. A Multimetric Approach for Evaluation of ChatGPT-Generated Text Summaries;IEEE Engineering Management Review;2024-06