Affiliation:
1. Manav Rachna International Institute of Research & Studies, Faridabad, India
2. Manav Rachna International Institute
of Research & Studies, Faridabad, India
Abstract
Aims:
Feature Selection Techniques for Text Data Composed of Heterogeneous sources for sentiment
classification.
Objectives:
The objective of work is to analyze the feature selection technique for text gathered from different sources to
increase the accuracy of sentiment classification done on microblogs.
Methods:
Applied three feature selection techniques Bag-of-Word(BOW), TF-IDF, and word2vector to find the most
suitable feature selection techniques for heterogeneous datasets.
Results:
TF-IDF outperforms outh of the three selected feature selection technique for sentiment classification with SVM
classifier.
Conclusion:
Feature selection is an integral part of any data preprocessing task, and along with that, it is also important
for the machine learning algorithms in achieving good accuracy in classification results. Hence it is essential to find out
the best suitable approach for heterogeneous sources of data. The heterogeneous sources are rich sources of information
and they also play an important role in developing a model for adaptable systems as well. So keeping that also in mind we
have compared the three techniques for heterogeneous source data and found that TF-IDF is the most suitable one for all
types of data whether it is balanced or imbalanced data, it is a single source or multiple source data. In all cases, TF-IDF
approach is the most promising approach in generating the results for the classification of sentiments of users.
Publisher
Bentham Science Publishers Ltd.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献