Author:
Shah Himat,Ahmed Dr. Shafique,Sathio Anwar Ali,Burdi Dr Asadullah
Abstract
This paper addresses the problem of an automatic keyphrase extraction for a webpage text. Our method is unsupervised, and we call it W-rank. In our method, first we extract the text of a webpage and tokenize into three different candidate words list: unigram ,bigrams and noun phrases. Then we assign score to all words based on their individual appearance in linguistic and DOM-based feature sets. In the final step, we rank these candidate words using score and select top 5 keyphrase from each list and combine them as a final keyphrases for a given webpage. We focus more on the relevancy of keyphrases to its content using linguistic features. We compare our method with other methods using precision, recall and f-score. The experimental result shows, W-rank improves the performance of our previous method D-rank and outperforms other state of art methods.
Reference37 articles.
1. M. Abulaish, M. Fazil, and M. J. Zaki, "Domain-specific keyword extraction using joint modeling of local and global contextual semantics," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 16, no. 4, pp. 1-30, 2022.
2. Q. Hu, J. Shen, K. Wang, J. Du, and Y. Du, "A web service clustering method based on topic enhanced Gibbs sampling algorithm for the Dirichlet Multinomial Mixture model and service collaboration graph," Information Sciences, vol. 586, pp. 239-260, 2022.
3. S. Brin and L. Page, "The anatomy of a large-scale hypertextual web search engine," Computer Networks and ISDN Systems, vol. 30, no. 1-7, pp. 107-117, 1998.
4. A. Bougouin, F. Boudin, and P. Y. Daille, "Topicrank: Graph-based topic ranking for keyphrase extraction," in International Joint Conference on Natural Language Processing (IJCNLP).
5. D. Nemirovsky and K. Avrachenkov, "Weighted Pagerank: Cluster-related weights," Saint Petersburg State Univ (Russia), Tech. Rep., 2008.