Cross-Domain Document Summarization Model via Two-Stage Curriculum Learning-Reference-Cited by-同舟云学术

Cross-Domain Document Summarization Model via Two-Stage Curriculum Learning

Published:2024-08-29 Issue:17 Volume:13 Page:3425
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Lee Seungsoo¹^ORCID,Kim Gyunyeop¹^ORCID,Kang Sangwoo¹^ORCID

Affiliation:

1. School of Computing, Gachon University, 1342, Seongnam-daero, Sujeong-gu, Seongnam-si 13120, Republic of Korea

Abstract

Generative document summarization is a natural language processing technique that generates short summary sentences while preserving the content of long texts. Various fine-tuned pre-trained document summarization models have been proposed using a specific single text-summarization dataset. However, each text-summarization dataset usually specializes in a particular downstream task. Therefore, it is difficult to treat all cases involving multiple domains using a single dataset. Accordingly, when a generative document summarization model is fine-tuned to a specific dataset, it performs well, whereas the performance is degraded by up to 45% for datasets that are not used during learning. In short, summarization models perform well with in-domain cases, as the dataset domain during training and evaluation is the same but perform poorly with out-domain inputs. In this paper, we propose a new curriculum-learning method using mixed datasets while training a generative summarization model to be more robust on out-domain datasets. Our method performed better than XSum with 10%, 20%, and 10% lower performance degradation in CNN/DM, which comprised one of two test datasets used, compared to baseline model performance.

Funder

National Research Foundation of Korea

Gachon University

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/17/3425/pdf

Reference47 articles.

1. Language Models are Few-Shot Learners;Larochelle;Proceedings of the Advances in Neural Information Processing Systems,2020

2. Exploring the limits of transfer learning with a unified text-to-text transformer;Raffel;J. Mach. Learn. Res.,2020

3. PaLM: Scaling Language Modeling with Pathways;Chowdhery;J. Mach. Learn. Res.,2023

4. Walker, M., Ji, H., and Stent, A. (2018, January 1–6). Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA. (Long Papers).

5. Hermann, K.M., Kočiský, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., and Blunsom, P. (2015, January 7–12). Teaching Machines to Read and Comprehend. Proceedings of the 28th International Conference on Neural Information Processing Systems—Volume 1, Montreal, QC, Canada.