A mixed approach of statistical weighting method and unsupervised method to improve Uyghur sentiment classification

Author:

Yalkun Erpan1,Slamu Wushour1,Turhuntay Raxida2

Affiliation:

1. College of Information Science and Engineering, Xinjiang University, Urumqi, Xinjiang 830046, China

2. College of Electronic and Information Engineering, Yili Normal University, Yili, Xinjiang 835000, China

Abstract

Considering the scarcity of Uyghur sentiment resources, in this paper proposed a new combined unsupervised sentiment classification method for Uyghur text without any labeled corpora. In the first part, a Uyghur sentiment dictionary, UYSentiDict, was adopted to classify the sentences. For the sentiment vocabulary matching, both the matching of the original word and the stem were considered, and the influence of sentence patterns, negation words, and degree adverbs were further considered as well. Based on different thresholds, the sentences with higher sentiment values were selected from the lexicon-based classification results as a pseudo-labeled dataset. In the second part, different sentiment characteristics were learned from the pseudo-labeled dataset by the machine learning classifier, and the remaining categorical data were further classified. It can be concluded that the method proposed in this paper has good classification efficiency in Uyghur sentiment corpora in four different fields, and some results were performed better than the classification results of machine learning classifier. Moreover, this method is not restricted by the field of data and does not need to be marked in advance with good training corpus, and can solve the resource shortage problem in the field of Uyghur sentiment classification effectively.

Publisher

IOS Press

Subject

Computational Mathematics,Computer Science Applications,General Engineering

Reference33 articles.

1. B. Pang, L. Lee and S. Vaithyanathan, Thumbs up? Sentiment classification using machine learning techniques, in: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 2002, pp. 79–86.

2. K. Dave, S. Lawrence and D.M. Pennock, Mining the peanut gallery: Opinion extraction and semantic classification of product reviews, in: International Conference on World Wide Web, 2003, pp. 519–528.

3. A.L. Maas, R.E. Daly, P.T. Pham, D. Huang, A.Y. Ng and C. Potts, Learning word vectors for sentiment analysis, in: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011, pp. 142–150.

4. A. Esuli and F. Sebastiani, SentiWordNet: A publicly available lexical resource for opinion mining, in: Proceedings of the 5th Conference on Language Resources and Evaluation, 2006, pp. 417–422.

5. X. Mou and Y. Du, Sentiment classification of Chinese movie reviews in micro-blog based on context, in: IEEE International Conference on Cloud Computing and Big Data Analysis, 2016, pp. 313–318.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3