Affiliation:
1. Department of Computer Science, University of Peshawar, KPK, Pakistan
Abstract
From the last decade, Sentiment Analysis of languages such as English and Chinese are particularly the focus of attention but resource poor languages such as Urdu are mostly ignored by the research community, which is focused in this research. After acquiring data from various blogs of about 14 different genres, the data is being annotated with the help of human annotators. Three well-known classifiers, that is, Support Vector Machine, Decision tree and [Formula: see text]-Nearest Neighbor ([Formula: see text]-NN) are tested, their outputs are compared and their results are ultimately improved in several iterations after taking a number of steps that include stop words removal, feature extraction, identification and extraction of important features. extraction. Initially, the performance of the classifiers is not satisfactory as the accuracy achieved by all the three is below 50%. Ensemble of classifiers is also tried but the results are not fruitful (in terms of high accuracy). The results are analyzed carefully and improvements are made including feature extraction that raised the performance of these classifiers to a satisfactory level. It is further concluded that [Formula: see text]-NN is performing better than Support Vector Machine and Decision tree in terms of accuracy, precision, recall and [Formula: see text]-measure.
Funder
the Shaanxi Science and Technology Innovation Scheme
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Cited by
48 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献