Enhanced Language Modeling with Proximity and Sentence Relatedness Information for Extractive Broadcast News Summarization

Author:

Liu Shih-Hung1,Chen Kuan-Yu2,Chen Berlin3

Affiliation:

1. Delta Management System, Taipei City, Taiwan (R.O.C.)

2. National Taiwan University of Science and Technology, Taipei City, Taiwan (R.O.C)

3. National Taiwan Normal University; ASUS Intelligent Cloud Service Center (AICS)

Abstract

The primary task of extractive summarization is to automatically select a set of representative sentences from a text or spoken document that can concisely express the most important theme of the original document. Recently, language modeling (LM) has been proven to be a promising modeling framework for performing this task in an unsupervised manner. However, there still remain three fundamental challenges facing the existing LM-based methods, which we set out to tackle in this article. The first one is how to construct a more accurate sentence model in this framework without resorting to external sources of information. The second is how to take into account sentence-level structural relationships, in addition to word-level information within a document, for important sentence selection. The last one is how to exploit the proximity cues inherent in sentences to obtain a more accurate estimation of respective sentence models. Specifically, for the first and second challenges, we explore a novel, principled approach that generates overlapped clusters to extract sentence relatedness information from the document to be summarized, which can be used not only to enhance the estimation of various sentence models but also to render sentence-level structural relationships within the document, leading to better summarization effectiveness. For the third challenge, we investigate several formulations of proximity cues for use in sentence modeling involved in the LM-based summarization framework, free of the strict bag-of-words assumption. Furthermore, we also present various ensemble methods that seamlessly integrate proximity and sentence relatedness information into sentence modeling. Extensive experiments conducted on a Mandarin broadcast news summarization task show that such integration of proximity and sentence relatedness information is indeed beneficial for speech summarization. Our proposed summarization methods can significantly boost the performance of an LM-based strong baseline (e.g., with a maximum ROUGE-2 improvement of 26.7% relative) and also outperform several state-of-the-art unsupervised methods compared in the article.

Funder

Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan

National Science Council, Taiwan

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Analyzing the Effects of Transcription Errors on Summary Generation of Bengali Spoken Documents;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-08-16

2. NLP TRANSFORMERS: ANALYSIS OF LLMS AND TRADITIONAL APPROACHES FOR ENHANCED TEXT SUMMARIZATION;Eskişehir Osmangazi Üniversitesi Mühendislik ve Mimarlık Fakültesi Dergisi;2024-04-22

3. Deep neural architecture for natural language image synthesis for Tamil text using BASEGAN and hybrid super resolution GAN (HSRGAN);Scientific Reports;2023-09-02

4. Discovering technology and science innovation opportunity based on sentence generation algorithm;Journal of Informetrics;2023-05

5. GA-SCS: Graph-Augmented Source Code Summarization;ACM Transactions on Asian and Low-Resource Language Information Processing;2023-02-21

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3