Affiliation:
1. Dr. C. V. Raman University, Bilaspur, India
Abstract
Text mining is a process that uses data mining approaches to extract valuable information held in the hidden form in textual data. In this paper, we are proposing a framework for fuzzy clustering of news articles. These news articles originate on different news portals on the web. The data obtained need to be stored in a central database and then pre-processing reduces the noise. The keyword extraction is used to extract keywords from the text and then word-frequency vector is generated. On these vectors, distance measure or similarity measure function is used to find the similarity between articles. One article may belong to more than one cluster so fuzzy context vector must be generated. Mutual Information can be used to find fuzzy membership values. The threshold values are required for the identification of clusters. The proposed framework shows that fuzzy clustering does not restrict each news article to belong exactly to one cluster. Therefore this framework when applied to information retrieval systems or other application systems, system gives better performance and relevance to the users.