Cross-lingual extreme summarization of scholarly documents

Author:

Takeshita SotaroORCID,Green Tommaso,Friedrich Niklas,Eckert Kai,Ponzetto Simone Paolo

Abstract

AbstractThe number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work. Recent work has tried to address this problem by developing methods for automated summarization in the scholarly domain, but concentrated so far only on monolingual settings, primarily English. In this paper, we consequently explore how state-of-the-art neural abstract summarization models based on a multilingual encoder–decoder architecture can be used to enable cross-lingual extreme summaries of scholarly texts. To this end, we compile a new abstractive cross-lingual summarization dataset for the scholarly domain in four different languages, which enables us to train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese. We present our new X-SCITLDR dataset for multilingual summarization and thoroughly benchmark different models based on a state-of-the-art multilingual pre-trained model, including a two-stage pipeline approach that independently summarizes and translates, as well as a direct cross-lingual model. We additionally explore the benefits of intermediate-stage training using English monolingual summarization and machine translation as intermediate tasks and analyze performance in zero- and few-shot scenarios. Finally, we investigate how to make our approach more efficient on the basis of knowledge distillation methods, which make it possible to shrink the size of our models, so as to reduce the computational complexity of the summarization inference.

Funder

Deutsche Forschungsgemeinschaft

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences

Reference99 articles.

1. Abu-Jbara, A., Radev, D.R.: Coherent citation-based summarization of scientific papers. In: Lin D, Matsumoto Y, Mihalcea R (eds) The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. In: Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA. The Association for Computer Linguistics, pp 500–509, (2011) https://aclanthology.org/P11-1051/

2. AbuRa’ed, A., Chiruzzo, L., Saggion, H., et al.: Lastus/taln @ clscisumm-17: Cross-document sentence matching and scientific text summarization systems. In: Jaidka K, Chandrasekaran MK, Kan M (eds) Proceedings of the Computational Linguistics Scientific Summarization Shared Task (CL-SciSumm 2017) organized as a part of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017) and co-located with the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017), Tokyo, Japan, August 11, 2017, CEUR Workshop Proceedings, vol 2002. CEUR-WS.org, pp 55–66, (2017)http://ceur-ws.org/Vol-2002/talnclscisumm2017.pdf

3. AbuRa’ed, A., Bravo, À., Chiruzzo, L., et al.: Lastus/taln+inco @ cl-scisumm 2018 - using regression and convolutions for cross-document semantic linking and summarization of scholarly literature. In: Mayr P, Chandrasekaran MK, Jaidka K (eds) Proceedings of the 3rd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2018) co-located with the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), Ann Arbor, USA, July 12, 2018, CEUR Workshop Proceedings, vol 2132. CEUR-WS.org, pp 150–163, (2018) http://ceur-ws.org/Vol-2132/paper15.pdf

4. Accuosto, P., Saggion, H.: Mining arguments in scientific abstracts with discourse-level embeddings. Data Knowl Eng 129(101), 840 (2020). https://doi.org/10.1016/j.datak.2020.101840

5. Accuosto, P., Neves, M., Saggion, H.: Argumentation mining in scientific literature: From computational linguistics to biomedicine. In: Frommholz I, Mayr P, Cabanac G, et al (eds) Proceedings of the 11th International Workshop on Bibliometric-enhanced Information Retrieval co-located with 43rd European Conference on Information Retrieval (ECIR 2021), Lucca, Italy (online only), April 1st, 2021, CEUR Workshop Proceedings, vol 2847. CEUR-WS.org, pp 20–36, (2021) http://ceur-ws.org/Vol-2847/paper-03.pdf

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3