Homogenous Ensemble Boosting Approach to Improve the Consistency in the Accuracy of Text Data Classification

Author:

Azam Muhammad1,Sabah Fahad1,Raheem Abdul2,Ahmad Nadeem1,Irfan Danish1,Sarwar Raheem3

Affiliation:

1. Superior University

2. Beijing University of Technology

3. OTEHM, Manchester Metropolitan University

Abstract

Abstract The rapid growth of the internet in recent years has produced an enormous amount of data. The significant chunk of this data is unstructured. This unstructured data requires critical analysis and modelling to become useful for decision making. Due to the wild spread of internet across the globe, several applications are being developed every day. These applications have direct interaction with end-users, and users can provide their opinions, sentiments, reviews etc. about the products, services, events, etc. These sentiments, reviews and opinions are very useful for individuals, organizations, businesses, and governments for future decision making. Surveys from last few years confer those online opinions have more prominent financial effect compared to traditional media advertisement. The significant task of sentiment analysis is used to locate the useful information from the client sentiment. While this substance is intended to be valuable, most of this client produced content requires using the data mining methods and sentiment analysis. However, a few difficulties are confronting sentiment analysis. Sentiment analysis includes the applications of natural language processing and text analysis methods to recognize and separate the useful information from text data. Machine learning techniques are widely used for sentiment classification. In this paper, we provide a deep understanding of different machine learning systems for sentiment classification. An extensive study of homogenous ensemble-based machine learning techniques in the domain of sentiment classification has been carried out to enhance the efficiency and consistency by implementing various learning algorithms to gain better accuracy that can be attained from any of the individual learning algorithms. Our methodology in this paper is to explore the whole process from data preprocessing to classification accuracy. Various preprocessing steps are applied to selected text data to prepare data for classification. Many classification models (NB, NNET, KNN, RPART, SVM, LDA, CTREE) are explored from a different family of classifiers for classification purpose. Lastly, homogeneous ensemble techniques (Boosting (GBM) and Bagging (RF)) are used and compared with individual classifiers. And results obtained shows that Boosting ensemble model is more consistent and accurate than all other discussed models.

Publisher

Research Square Platform LLC

Reference26 articles.

1. Differences in resource use and costs of dementia care between European countries: Baseline data from the ICTUS study;Gustavsson A;The Journal Of Nutrition, Health & Aging,2010

2. L. Piyathilaka and S. Kodagoda, “Human activity recognition for domestic robots,” in Proc. Field and Service Robotics: Results of the 9th International Conference, Fujisawa, Germany, pp. 395–408, 2015.

3. H. Admoni and B. Scassellati, “Data-driven model of nonverbal behavior for socially assistive human-robot interactions,” in Proc. the 16th International Conference on Multimodal Interaction, New York, USA, pp. 196–199, 2014.

4. Wearables and social signal processing for smarter public presentations;Mihoub A;ACM Transactions on Interactive Intelligent,2019

5. Exploring the value of online product reviews in forecasting sales: The case of motion pictures;Dellarocas C;Journal of Interactive Marketing,2007

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Exploring Sleep Disorder and Lifestyle Analysis Through Data Preprocessing and Ensemble Learning Techniques;2024 2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS);2024-07-10

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3