Research paper classification systems based on TF-IDF and LDA schemes-Reference-Cited by-同舟云学术

Research paper classification systems based on TF-IDF and LDA schemes

Published:2019-08-26 Issue:1 Volume:9 Page:
ISSN:2192-1962
Container-title:Human-centric Computing and Information Sciences
language:en
Short-container-title:Hum. Cent. Comput. Inf. Sci.

Author:

Kim Sang-Woon,Gil Joon-Min^ORCID

Abstract

Abstract With the increasing advance of computer and information technologies, numerous research papers have been published online as well as offline, and as new research fields have been continuingly created, users have a lot of trouble in finding and categorizing their interesting research papers. In order to overcome the limitations, this paper proposes a research paper classification system that can cluster research papers into the meaningful class in which papers are very likely to have similar subjects. The proposed system extracts representative keywords from the abstracts of each paper and topics by Latent Dirichlet allocation (LDA) scheme. Then, the K-means clustering algorithm is applied to classify the whole papers into research papers with similar subjects, based on the Term frequency-inverse document frequency (TF-IDF) values of each paper.

Publisher

Springer Science and Business Media LLC

Subject

General Computer Science

Link

https://link.springer.com/content/pdf/10.1186/s13673-019-0192-7.pdf

Reference41 articles.

1. Bafna P, Pramod D, Vaidya A (2016) Document clustering: TF-IDF approach. In: IEEE int. conf. on electrical, electronics, and optimization techniques (ICEEOT). pp 61–66

2. Ramos J (2003) Using TF-IDF to determine word relevance in document queries. In: Proc. of the first int. conf. on machine learning

3. Havrlant L, Kreinovich V (2017) A simple probabilistic explanation of term frequency-inverse document frequency (TF-IDF) heuristic (and variations motivated by this explanation). Int J Gen Syst 46(1):27–36

4. Trstenjak B, Mikac S, Donko D (2014) KNN with TF-IDF based framework for text categorization. Procedia Eng 69:1356–1364

5. Yau C-K et al (2014) Clustering scientific documents with topic modeling. Scientometrics 100(3):767–786

Cited by 183 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Digital threads of architectural heritage: navigating tourism destination image through social media reviews and machine learning insights;Journal of Asian Architecture and Building Engineering;2024-09-09

2. Looking back to move forward: shedding light on the dark side of entrepreneurship;New England Journal of Entrepreneurship;2024-08-26

3. Conservation and Protection Treatments for Cultural Heritage: Insights and Trends from a Bibliometric Analysis;Coatings;2024-08-13

4. The delayed and combinatorial response of online public opinion to the real world: An inquiry into news texts during the COVID-19 era;Humanities and Social Sciences Communications;2024-08-10

5. Scientific publications clustering using textual and citation information;Expert Systems with Applications;2024-08