Author:
Venkata Sailaja N.,Padmasree L.,Mangathayaru N.
Abstract
PurposeText mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text mining is adopting the incremental learning data, as it is economical while dealing with large volume of information.Design/methodology/approachThe primary intention of this research is to design and develop a technique for incremental text categorization using optimized Support Vector Neural Network (SVNN). The proposed technique involves four major steps, such as pre-processing, feature selection, classification and feature extraction. Initially, the data is pre-processed based on stop word removal and stemming. Then, the feature extraction is done by extracting semantic word-based features and Term Frequency and Inverse Document Frequency (TF-IDF). From the extracted features, the important features are selected using Bhattacharya distance measure and the features are subjected as the input to the proposed classifier. The proposed classifier performs incremental learning using SVNN, wherein the weights are bounded in a limit using rough set theory. Moreover, for the optimal selection of weights in SVNN, Moth Search (MS) algorithm is used. Thus, the proposed classifier, named Rough set MS-SVNN, performs the text categorization for the incremental data, given as the input.FindingsFor the experimentation, the 20 News group dataset, and the Reuters dataset are used. Simulation results indicate that the proposed Rough set based MS-SVNN has achieved 0.7743, 0.7774 and 0.7745 for the precision, recall andF-measure, respectively.Originality/valueIn this paper, an online incremental learner is developed for the text categorization. The text categorization is done by developing the Rough set MS-SVNN classifier, which classifies the incoming texts based on the boundary condition evaluated by the Rough set theory, and the optimal weights from the MS. The proposed online text categorization scheme has the basic steps, like pre-processing, feature extraction, feature selection and classification. The pre-processing is carried out to identify the unique words from the dataset, and the features like semantic word-based features and TF-IDF are obtained from the keyword set. Feature selection is done by setting a minimum Bhattacharya distance measure, and the selected features are provided to the proposed Rough set MS-SVNN for the classification.
Subject
Library and Information Sciences,Information Systems
Reference31 articles.
1. Using the Tsetlin machine to learn human-interpretable rules for high-accuracy text categorization with medical applications;IEEE Access,2019
2. Italian text categorization with lemmatization and Support vector machines;Neural Approaches to Dynamics of Signal Exchanges,2019
3. Multi kernel and dynamic fractional lion optimization algorithm for data clustering;Alexandria Engineering Journal,2018
4. Incremental learning for text document classification,2007
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献