Affiliation:
1. National Institute of Technology, Raipur, India
2. Samsung Research Institute, Noida, India
Abstract
Enormous records and data are gathered every day. Organization of this data is a challenging task. Topic modeling provides a way to categorize these documents, where high dimensionality of the corpus affects the result of topic model, making it important to apply feature selection or information retrieval process for dimensionality reduction. The requirement for efficient topic modeling includes the removal of unrelated words that might lead to specious coexistence of the unrelated words. This paper proposes an efficient framework for the generation of better topic coherence, where term frequency-inverse document frequency (TF-IDF) and parsimonious language model (PLM) are used for the information retrieval task. PLM extracts the important information and expels the general words from the corpus, whereas TF-IDF re-estimates the weightage of each word in the corpus. The work carried out in this paper improved the topic coherence measure to provide a better correlation among the actual topic and the topics generated from PLM.
Subject
Computer Networks and Communications,Information Systems
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献