The feature extraction for classifying words on social media with the Naïve Bayes algorithm-Reference-Cited by-同舟云学术

The feature extraction for classifying words on social media with the Naïve Bayes algorithm

Published:2022-09-01 Issue:3 Volume:11 Page:1041
ISSN:2252-8938
Container-title:IAES International Journal of Artificial Intelligence (IJ-AI)
language:
Short-container-title:IJ-AI

Author:

Lubis Arif Ridho^ORCID,Nasution Mahyuddin Khairuddin Matyuso^ORCID,Sitompul Opim Salim^ORCID,Zamzami Elviawaty Muisa^ORCID

Abstract

To classify Naïve Bayes classification (NBC), however, it is necessary to have a previous pre-processing and feature extraction. Generally, pre-processing eliminates unnecessary words while feature extraction processes these words. This paper focuses on feature extraction in which calculations and searches are used by applying word2vec while in frequency using term frequency-Inverse document frequency (TF-IDF). The process of classifying words on Twitter with 1734 tweets which are defined as a document to weight the calculation of frequency with TF-IDF with words that often come out in tweet, the value of TF-IDF decreases and vice versa. Following the achievement of the weight value of the word in the tweet, the classification is carried out using Naïve Bayes with 1734 test data, yielding an accuracy of 88.8% in the Slack word category tweet and while in the tweet category of verb 78.79%. It can be concluded that the data in the form of words available on twitter can be classified and those that refer to slack words and verbs with a fairly good level of accuracy. so that it manifests from the habit of twitter social media user.

Publisher

Institute of Advanced Engineering and Science

Subject

Electrical and Electronic Engineering,Artificial Intelligence,Information Systems and Management,Control and Systems Engineering

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A comparison of the performance of data mining classification algorithms on medical datasets with the application of data normalization;AIP Conference Proceedings;2024

2. Implementation of Preprocessing in Text Summarization Techniques for Indonesian Language Documents Using the Flax T5 Approach;2023 11th International Conference on Cyber and IT Service Management (CITSM);2023-11-10

3. Comparison Between Bi-Directional LSTM And Transfer Learning in Correcting Typing Errors on Twitter Social Media Posts;2023 11th International Conference on Cyber and IT Service Management (CITSM);2023-11-10

4. Optimization of SVM Classification Accuracy with Bayesian Optimization Utilizing Data Augmentation;2023 6th International Conference of Computer and Informatics Engineering (IC2IE);2023-09-14

5. Comparison of Model in Predicting Customer Churn Based on Users' habits on E-Commerce;2022 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI);2022-12-08