Combining co-clustering with noise detection for theme-based summarization

Author:

Cai Xiaoyan1,Li Wenjie2,Zhang Renxian3

Affiliation:

1. Northwest Agricultural and Forestry University, Shaanxi, China

2. The Hong Kong Polytechnic University, Hung Hom, Hong Kong

3. The Hong Kong Polytechnic University and Samsung Electronics Research Center, China

Abstract

To overcome the fact that the length of sentences is short and their content is limited, we regard words as independent text objects rather than features of sentences in sentence clustering and develop two co-clustering frameworks, namely integrated clustering and interactive clustering, to cluster sentences and words simultaneously. Since real-world datasets always contain noise, we incorporate noise detection and removal to enhance clustering of sentences and words. Meanwhile, a semisupervised approach is explored to incorporate the query information (and the sentence information in early document sets) in theme-based summarization. Thorough experimental studies are conducted. When evaluated on the DUC2005-2007 datasets and TAC 2008-2009 datasets, the performance of the two noise-detecting co-clustering approaches is comparable with that of the top three systems. The results also demonstrate that the interactive with noise detection algorithm is more effective than the noise-detecting integrated algorithm.

Funder

National Natural Science Foundation of China

Central Universities

Research Grants Council, University Grants Committee, Hong Kong

Publisher

Association for Computing Machinery (ACM)

Subject

Computational Mathematics,Computer Science (miscellaneous)

Reference50 articles.

1. Mining distance-based outliers in near linear time with randomization and a simple pruning rule

2. Learning spectral clustering;Bach F. R.;Adv. Neural Inf. Process. Syst.,2004

3. LOF

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Probabilistic Approach for Extractive Summarization Based on Clustering Cum Graph Ranking Method;IEEE Access;2024

2. BERT-based ensemble model for Hindi summarization;2022 International Interdisciplinary Humanitarian Conference for Sustainability (IIHC);2022-11-18

3. Extractive summarization in Hindi using BERT-based ensemble model;2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT);2022-10-03

4. COSUM: Text summarization based on clustering and optimization;Expert Systems;2018-10-12

5. Discovering the core semantics of event from social media;Future Generation Computer Systems;2016-11

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3