TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique-Reference-Cited by-同舟云学术

TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique

Published:2020-03-05 Issue:4 Volume:12 Page:811-833
ISSN:1866-9956
Container-title:Cognitive Computation
language:en
Short-container-title:Cogn Comput

Author:

Rabby Gollam,Azad Saiful,Mahmud Mufti^ORCID,Zamli Kamal Z.,Rahman Mohammed Mostafizur

Abstract

AbstractAutomatic keyphrase extraction techniques aim to extract quality keyphrases for higher level summarization of a document. Majority of the existing techniques are mainly domain-specific, which require application domain knowledge and employ higher order statistical methods, and computationally expensive and require large train data, which is rare for many applications. Overcoming these issues, this paper proposes a new unsupervised keyphrase extraction technique. The proposed unsupervised keyphrase extraction technique, named TeKET or Tree-based Keyphrase Extraction Technique, is a domain-independent technique that employs limited statistical knowledge and requires no train data. This technique also introduces a new variant of a binary tree, called KeyPhrase Extraction (KePhEx) tree, to extract final keyphrases from candidate keyphrases. In addition, a measure, called Cohesiveness Index or CI, is derived which denotes a given node’s degree of cohesiveness with respect to the root. The CI is used in flexibly extracting final keyphrases from the KePhEx tree and is co-utilized in the ranking process. The effectiveness of the proposed technique and its domain and language independence are experimentally evaluated using available benchmark corpora, namely SemEval-2010 (a scientific articles dataset), Theses100 (a thesis dataset), and a German Research Article dataset, respectively. The acquired results are compared with other relevant unsupervised techniques belonging to both statistical and graph-based techniques. The obtained results demonstrate the improved performance of the proposed technique over other compared techniques in terms of precision, recall, and F1 scores.

Funder

Universiti Malaysia Pahang

Publisher

Springer Science and Business Media LLC

Subject

Cognitive Neuroscience,Computer Science Applications,Computer Vision and Pattern Recognition

Link

http://link.springer.com/content/pdf/10.1007/s12559-019-09706-3.pdf

Reference79 articles.

1. Adeniyi D, Wei Z, Yongquan Y. Automated web usage data mining and recommendation system using k-nearest neighbor (knn) classification method. Appl Comput Inform 2016;12(1):90–108.

2. Arampatzis A, Tsoris T, Koster CHA, Weide TPVD. Phrase-based information retrieval. Inf Process Manag 1998;34(6):693–707.

3. Bennani-Smires K, Musat C, Hossmann A, Baeriswyl M, Jaggi M. 2018. Simple unsupervised keyphrase extraction using sentence embeddings. arXiv:180104470.