Author:
Saileela P V R N S S V,Rao N. Naga Malleswara
Abstract
For sentiment analysis in particular, the problem of processing and analyzing high-dimensional data becomes more prominent in recent past. This is where the IEL-HDDSA model, which aims to increase accuracy and performance in complex, high-dimensional data streams sentiment analysis comes into play. Iterative approach in ensemble learning; a contribution to the field. It integrates preprocessing techniques such as tokenization, stop word removal, lemmatization and the collection of sentiment-related features. Then the training corpus is divided by label, and features with high mutual information are selected. Highly replicated points of data for model training can also be identified at this point. First a Naive Bayes model is trained, then later it's placed in an ensemble as part of bagging. Its major advantage over earlier methods is that IEL-HDDSA can iteratively train on selected subsets of data until the performance in sentiment analysis for high-dimensional objects reaches an optimum level. A 10-fold cross validation method was used to rigorously evaluate the performance of this model, which showed consistently high levels of operation with almost no variation across different measures. IEL-HDDSA's precision ranged from 0.9359 to 0.9492, and its specificity was between 0. Its accuracy differed from 0.93 to around 0.95, and its F1-measure fluctuated between the values of about 0.94 and above; so here too balance was well maintained in a manner that satisfied both precision and recall requirements equally. The false alarming rate fell from 0.056 to 0.1, a fairly low ratio of incorrect positive classifications; Moreover, MCC quantities ranged from 0.8668 to 0. These results testify to the IEL-HDDSA model's stable effectiveness and high reproducibility in sentiment analysis applications, especially for massive data flows.
Publisher
Scalable Computing: Practice and Experience
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献