Author:
Fauzi M. Ali,Arifin Agus Zainal,Gosaria Sonny Christiano
Abstract
Since the rise of WWW, information available online is growing rapidly. One of the example is Indonesian online news. Therefore, automatic text classification became very important task for information filtering. One of the major issue in text classification is its high dimensionality of feature space. Most of the features are irrelevant, noisy, and redundant, which may decline the accuracy of the system. Hence, feature selection is needed. Maximal Marginal Relevance for Feature Selection (MMR-FS) has been proven to be a good feature selection for text with many redundant features, but it has high computational complexity. In this paper, we propose a two-phased feature selection method. In the first phase, to lower the complexity of MMR-FS we utilize Information Gain first to reduce features. This reduced feature will be selected using MMR-FS in the second phase. The experiment result showed that our new method can reach the best accuracy by 86%. This new method could lower the complexity of MMR-FS but still retain its accuracy.
Publisher
Institute of Advanced Engineering and Science
Subject
Electrical and Electronic Engineering,Control and Optimization,Computer Networks and Communications,Hardware and Architecture,Information Systems,Signal Processing
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Explainable Bengali Multiclass News Classification;2023 26th International Conference on Computer and Information Technology (ICCIT);2023-12-13
2. Sentiment Analysis in Product Reviews with Maximum Entropy and Naïve Bayes Using N-gram Method;2023 6th International Conference on Information and Communications Technology (ICOIACT);2023-11-10
3. News Headlines Classification for Disease Outbreak Detection using Modified Term Weighting approach;2023-03-23
4. Exploration of Wine features using Data Analytics;2022 International Conference on Futuristic Technologies (INCOFT);2022-11-25
5. Textual Sentimental Classification using Convolution Neural Network Algorithm;2022 International Conference on Computing, Communication and Power Technology (IC3P);2022-01