MultiSumm: Towards a Unified Model for Multi-Lingual Abstractive Summarization-Reference-Cited by-同舟云学术

MultiSumm: Towards a Unified Model for Multi-Lingual Abstractive Summarization

Published:2020-04-03 Issue:01 Volume:34 Page:11-18
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Cao Yue,Wan Xiaojun,Yao Jinge,Yu Dian

Abstract

Automatic text summarization aims at producing a shorter version of the input text that conveys the most important information. However, multi-lingual text summarization, where the goal is to process texts in multiple languages and output summaries in the corresponding languages with a single model, has been rarely studied. In this paper, we present MultiSumm, a novel multi-lingual model for abstractive summarization. The MultiSumm model uses the following training regime: (I) multi-lingual learning that contains language model training, auto-encoder training, translation and back-translation training, and (II) joint summary generation training. We conduct experiments on summarization datasets for five rich-resource languages: English, Chinese, French, Spanish, and German, as well as two low-resource languages: Bosnian and Croatian. Experimental results show that our proposed model significantly outperforms a multi-lingual baseline model. Specifically, our model achieves comparable or even better performance than models trained separately on each language. As an additional contribution, we construct the first summarization dataset for Bosnian and Croatian, containing 177,406 and 204,748 samples, respectively.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A topic modeling‐based bibliometric exploration of automatic summarization research;WIREs Data Mining and Knowledge Discovery;2024-04-25

2. End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric;IEEE Access;2024

3. Text Summarisation for Low-Resourced Languages, A Review;Communications in Computer and Information Science;2024

4. MCLS: A Large-Scale Multimodal Cross-Lingual Summarization Dataset;Lecture Notes in Computer Science;2023

5. Cross-lingual Machine Translation: An Analysis Model for Low Resource Languages;Lecture Notes in Networks and Systems;2023