A mixed approach of statistical weighting method and unsupervised method to improve Uyghur sentiment classification
-
Published:2021-09-28
Issue:4
Volume:21
Page:829-851
-
ISSN:1472-7978
-
Container-title:Journal of Computational Methods in Sciences and Engineering
-
language:
-
Short-container-title:JCM
Author:
Yalkun Erpan1, Slamu Wushour1, Turhuntay Raxida2
Affiliation:
1. College of Information Science and Engineering, Xinjiang University, Urumqi, Xinjiang 830046, China 2. College of Electronic and Information Engineering, Yili Normal University, Yili, Xinjiang 835000, China
Abstract
Considering the scarcity of Uyghur sentiment resources, in this paper proposed a new combined unsupervised sentiment classification method for Uyghur text without any labeled corpora. In the first part, a Uyghur sentiment dictionary, UYSentiDict, was adopted to classify the sentences. For the sentiment vocabulary matching, both the matching of the original word and the stem were considered, and the influence of sentence patterns, negation words, and degree adverbs were further considered as well. Based on different thresholds, the sentences with higher sentiment values were selected from the lexicon-based classification results as a pseudo-labeled dataset. In the second part, different sentiment characteristics were learned from the pseudo-labeled dataset by the machine learning classifier, and the remaining categorical data were further classified. It can be concluded that the method proposed in this paper has good classification efficiency in Uyghur sentiment corpora in four different fields, and some results were performed better than the classification results of machine learning classifier. Moreover, this method is not restricted by the field of data and does not need to be marked in advance with good training corpus, and can solve the resource shortage problem in the field of Uyghur sentiment classification effectively.
Subject
Computational Mathematics,Computer Science Applications,General Engineering
Reference33 articles.
1. B. Pang, L. Lee and S. Vaithyanathan, Thumbs up? Sentiment classification using machine learning techniques, in: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 2002, pp. 79–86. 2. K. Dave, S. Lawrence and D.M. Pennock, Mining the peanut gallery: Opinion extraction and semantic classification of product reviews, in: International Conference on World Wide Web, 2003, pp. 519–528. 3. A.L. Maas, R.E. Daly, P.T. Pham, D. Huang, A.Y. Ng and C. Potts, Learning word vectors for sentiment analysis, in: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011, pp. 142–150. 4. A. Esuli and F. Sebastiani, SentiWordNet: A publicly available lexical resource for opinion mining, in: Proceedings of the 5th Conference on Language Resources and Evaluation, 2006, pp. 417–422. 5. X. Mou and Y. Du, Sentiment classification of Chinese movie reviews in micro-blog based on context, in: IEEE International Conference on Cloud Computing and Big Data Analysis, 2016, pp. 313–318.
|
|