Affiliation:
1. Thapar Institute of Engineering and Technology, India
2. Jaypee University of Information Technology, India
Abstract
Various natural language processing tasks are carried out to feed into computerized decision support systems. Among these, sentiment analysis is gaining more attention. The majority of sentiment analysis relies on the social media content. This web content is highly un-normalized in nature. This hinders the performance of decision support system. To enhance the performance, it is required to process data efficiently. This article proposes a novel method of normalization of web data during the pre-processing phase. It is aimed to get better results for different natural language processing tasks. This research applies this technique on data for sentiment analysis. Performance of different learning models is analysed using precision, recall, f-measure, fallout for normalize and un-normalize sentiment analysis. Results shows after normalization, some documents shift their polarity i.e. negative to positive. Experimental results show normalized data processing outperforms un-normalized data processing with better accuracy.
Reference22 articles.
1. Estimating term domain relevance through term frequency, disjoint corpora frequency - tf-dcf
2. Standardizing tweets with character-level machine translation.;N.Ljubešić;Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics,2014
3. Rewriting the orthography of SMS messages
4. Automatic standardisation of texts containing spelling variation: How much training data do you need?;A.Baron;Proceedings of the Corpus Linguistics Conference: CL2009.,2009
5. Evaluating the effect of normalizing informal text on TTS output