Affiliation:
1. Department of Informatics, University of Pretoria, Gauteng, South Africa
Abstract
Search engines are tools used to find information on the Internet. Since the web has a plethora of websites, the engine queries the majority of active sites and builds a database organized according to keywords utilized in the search. Because of this, when a user types a few descriptive words on the home page of the search engine, the search function lists websites corresponding to these keywords. However, there are some problems with this search approach. For instance, if a user wants information about the word Jaguar, most search results are animals and cars. This is a polysemic problem that forces search engines to always provide the most popular but not the most relevant results. This article presents a study of using sentiment technology to help news classification and categorization and improve the classification accuracy. We have introduced a smart search function embedded into a search engine to tackle polysemic issues and record relevant results to determine their sentimentality. Therefore, this study presents a topic that involves several aspects of natural language processing (NLP) and sentiment analysis for news categorization and classification. A web crawler was used to collect British Broadcasting Corporation (BBC) news across the Internet, carried out preprocessing of text by using NLP, and applied sentiment analysis methods to determine the polarity of the processed text data. The sentimentality represents negative, positive, or neutral polarities assigned by the sentiment analysis algorithms. The research utilized the BBC news site to collect different information using a web crawler and a database to explore the sentimentality of BBC news. The natural language toolkit (NLTK) and BM25 indexed and preprocessed patterns in the database. The experimental results depict the proposed search function surpassing normal search with an accuracy rate of 85%. Moreover, the results show a negative polarity of BBC news using the Sentistrength algorithm. Furthermore, the Valence Aware Dictionary and sEntiment Reasoner (VADER) was the best-performing sentiment analysis model for news classification. This model obtained an accuracy of 85% using data collected with the proposed smart function.
Subject
Artificial Intelligence,Human-Computer Interaction,Theoretical Computer Science,Software
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献