A novel approach to capture the similarity in summarized text using embedded model-Reference-Cited by-同舟云学术

A novel approach to capture the similarity in summarized text using embedded model

Published:2022-01-01 Issue:1 Volume:15 Page:
ISSN:1178-5608
Container-title:International Journal on Smart Sensing and Intelligent Systems
language:en
Short-container-title:

Author:

Mishra Asha Rani¹,Panchal V.K.¹

Affiliation:

1. Department of Computer Science , Al Falah University , Faridabad , Haryana , India .

Abstract

Abstract The presence of near duplicate textual content imposes great challenges while extracting information from it. To handle these challenges, detection of near duplicates is a prime research concern. Existing research mostly uses text clustering, classification and retrieval algorithms for detection of near duplicates. Text summarization, an important tool of text mining, is not explored yet for the detection of near duplicates. Instead of using the whole document, the proposed method uses its summary as it saves both time and storage. Experimental results show that traditional similarity algorithms were able to capture similarity relatedness to a great extent even on the summarized text with a similarity score of 44.685%. Moreover, degree of similarity capture was greater (0.52%) in case of use of embedding models with better text representation as compared to traditional methods. Also, this paper highlights the research status of various similarity measures in terms of concept involved, merits and demerits.

Publisher

Walter de Gruyter GmbH

Subject

Electrical and Electronic Engineering,Control and Systems Engineering

Link

https://www.sciendo.com/pdf/10.2478/ijssis-2022-0002

Reference41 articles.

1. Ajees, A. P., Abrar, K. J., Sumam, M. I. and Sreenathan, M. 2021. A deep level tagger for malayalam, a morphologically rich language. Journal of Intelligent Systems 30(1): 115–129.

2. Albalawi, R., Yeap, T. H. and Benyoucef, M. 2020. Using topic modeling methods for short-text data: a comparative analysis. Frontiers in Artificial Intelligence 3. Available at: https://doi.org/10.3389/frai.2020.00042.

3. Alqahtani, A., Alhakami, H., Alsubait, T. and Baz, A. 2021. A survey of text matching techniques. Engineering, Technology & Applied Science Research 11(1): 6656–6661. doi: 10.48084/etasr.3968.[1].

4. Alqrainy, S. and Alawairdhi, M. 2021. Towards developing a comprehensive tag set for the arabic language. Journal of Intelligent Systems 30(1): 287–296.

5. Al-Subaihin, A., Sarro, F. and Black, S. 2019. Empirical comparison of text-based mobile apps similarity measurement techniques. Empirical Software Engineering 24: 3290–3315.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning model for chatGPT usage detection in students’ answers to open-ended questions: Case of Lithuanian language;Education and Information Technologies;2024-03-09

2. FinKENet: A Novel Financial Knowledge Enhanced Network for Financial Question Matching;Entropy;2023-12-26

3. TokenDoc: Source Authentication With a Hybrid Approach of Smart Contract and RNN-Based Models With AES Encryption;IEEE Transactions on Engineering Management;2023