Author:
Veziroğlu Merve,Veziroğlu Erkan,Bucak İhsan Ömür
Abstract
The surge in digital content has fueled the need for automated text classification methods, particularly in news categorization using natural language processing (NLP). This work introduces a Python-based news classification system, focusing on Naive Bayes algorithms for sorting news headlines into predefined categories. Naive Bayes is favored for its simplicity and effectiveness in text classification. Our objective includes exploring the creation of a news classification system and evaluating various Naive Bayes algorithms. The dataset comprises BBC News headlines spanning technology, business, sports, entertainment, and politics. Analyzing category distribution and headline length provided dataset insights. Data preprocessing involved text cleaning, stop word removal, and feature extraction with Count Vectorization to convert text into machine-readable numerical data. Four Naive Bayes variants were evaluated: Gaussian, Multinomial, Complement, and Bernoulli. Performance metrics such as accuracy, precision, recall, and F1 score were employed, and Naive Bayes algorithms were compared to other classifiers like Logistic Regression, Random Forest, Linear Support Vector Classification (SVC), Multi-Layer Perceptron (MLP) Classifier, Decision Trees, and K-Nearest Neighbors. The MLP Classifier achieved the highest accuracy, underscoring its effectiveness, while Multinomial and Complement Naive Bayes proved robust in news classification. Effective data preprocessing played a pivotal role in accurate categorization. This work contributes insights into Naive Bayes algorithm performance in news classification, benefiting NLP and news categorization systems.
Reference24 articles.
1. Greene D, Cunningham P. BBC Datasets. 2006. Available from:
2. Patel A, Meehan K. Fake news detection on reddit utilising count vectorizer and term frequency-inverse document frequency with logistic regression, multinominal NB and support vector machine. In: 2021 32nd Irish Signals and Systems Conference (ISSC). Athlone, Ireland: IEEE; 2021. pp. 1-6
3. Saritas MM, Yasar A. Performance analysis of ANN and Naive Bayes classification algorithm for data classification. International Journal of Intelligent Systems and Applications in Engineering. 2019;(2):88-91
4. Chen S, Webb GI, Liu L, Ma X. A novel selective naïve Bayes algorithm. Knowledge-Based Systems. 2020;:105361
5. Powers DM. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061. 2020
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献