Sentence Embedding Based Semantic Clustering Approach for Discussion Thread Summarization

Author:

Khan Atif1ORCID,Shah Qaiser1,Uddin M. Irfan2,Ullah Fasee3ORCID,Alharbi Abdullah4,Alyami Hashem5,Gul Muhammad Adnan1ORCID

Affiliation:

1. Department of Computer Science, Islamia College Peshawar, Peshawar, KP, Pakistan

2. Institute of Computing, Kohat University of Science and Technology, Kohat, Pakistan

3. Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China

4. Department of Information Technology, College of Computers and Information Technology, Taif University, Taif 21944, Saudi Arabia

5. Department of Computer Science, College of Computers and Information Technology, Taif University, Taif 21944, Saudi Arabia

Abstract

Huge data on the web come from discussion forums, which contain millions of threads. Discussion threads are a valuable source of knowledge for Internet users, as they have information about numerous topics. The discussion thread related to single topic comprises a huge number of reply posts, which makes it hard for the forum users to scan all the replies and determine the most relevant replies in the thread. At the same time, it is also hard for the forum users to manually summarize the bulk of reply posts in order to get the gist of discussion thread. Thus, automatically extracting the most relevant replies from discussion thread and combining them to form a summary are a challenging task. With this motivation behind, this study has proposed a sentence embedding based clustering approach for discussion thread summarization. The proposed approach works in the following fashion: At first, word2vec model is employed to represent reply sentences in the discussion thread through sentence embeddings/sentence vectors. Next, K-medoid clustering algorithm is applied to group semantically similar reply sentences in order to reduce the overlapping reply sentences. Finally, different quality text features are utilized to rank the reply sentences in different clusters, and then the high-ranked reply sentences are picked out from all clusters to form the thread summary. Two standard forum datasets are used to assess the effectiveness of the suggested approach. Empirical results confirm that the proposed sentence based clustering approach performed superior in comparison to other summarization methods in the context of mean precision, recall, and F-measure.

Funder

Islamia College, Peshawar

Publisher

Hindawi Limited

Subject

Multidisciplinary,General Computer Science

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Exploring Community Awareness of Mangrove Ecosystem Preservation through Sentence-BERT and K-Means Clustering;Information;2024-03-14

2. Text Summarization for Online and Blended Learning;Scalable Computing: Practice and Experience;2024-02-24

3. Visually Guided Network Reconstruction Using Multiple Embeddings;2023 IEEE 16th Pacific Visualization Symposium (PacificVis);2023-04

4. Contextual Word Embedding based Clustering for Extractive Summarization;2022 International Conference on Frontiers of Information Technology (FIT);2022-12

5. Interactive optimization of embedding-based text similarity calculations;Information Visualization;2022-08-03

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3