Towards High Performance Text Mining-Reference-Cited by-同舟云学术

Towards High Performance Text Mining

Published:2016-04 Issue:2 Volume:8 Page:58-75
ISSN:1938-0259
Container-title:International Journal of Grid and High Performance Computing
language:en
Short-container-title:

Author:

Yu Shanshan¹,Su Jindian²,Li Pengfei²,Wang Hao³

Affiliation:

1. College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China

2. College of Computer Science and Engineering, South China University of Technology, Guangzhou, China

3. Norwegian University of Science and Technology in Aalesund, Aalesund, Norway

Abstract

As a typical unsupervised learning method, the TextRank algorithm performs well for large-scale text mining, especially for automatic summarization or keyword extraction. However, TextRank only considers the similarities between sentences in the processes of automatic summarization and neglects information about text structure and context. To overcome these shortcomings, the authors propose an improved highly-scalable method, called iTextRank. When building a TextRank graph in their new method, the authors compute sentence similarities and adjust the weights of nodes by considering statistical and linguistic features, such as similarities in titles, paragraph structures, special sentences, sentence positions and lengths. Their analysis shows that the time complexity of iTextRank is comparable with TextRank. More importantly, two experiments show that iTextRank has a higher accuracy and lower recall rate than TextRank, and it is as effective as several popular online automatic summarization systems.

Publisher

IGI Global

Subject

Computer Networks and Communications

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Review of ambiguity problem in text summarization using hybrid ACA and SLR;Intelligent Systems with Applications;2024-06

2. Query-Based Extractive Text Summarization Using Sense-Oriented Semantic Relatedness Measure;Arabian Journal for Science and Engineering;2023-08-18

3. Classifying Fault Category and Severity of UAV Flight Controllers’ Reported Issues;2022 6th International Conference on System Reliability and Safety (ICSRS);2022-11-23

4. Implementation of Automatic Text Summarization with TextRank Method in the Development of Al-Qur’an Vocabulary Encyclopedia;Procedia Computer Science;2021

5. Summarization of Coal Mine Accident Reports: A Natural-Language-Processing-Based Approach;Communications in Computer and Information Science;2020