Youtube spam detection framework using naïve bayes and logistic regression-Reference-Cited by-同舟云学术

Youtube spam detection framework using naïve bayes and logistic regression

Published:2019-06-01 Issue:3 Volume:14 Page:1508
ISSN:2502-4760
Container-title:Indonesian Journal of Electrical Engineering and Computer Science
language:
Short-container-title:IJEECS

Author:

Samsudin Nur’Ain Maulat,Mohd Foozy Cik Feresa binti,Alias Nabilah,Shamala Palaniappan,Othman Nur Fadzilah,Wan Din Wan Isni Sofiah

Abstract

YouTube has become a popular social media among the users. Due to YouTube popularity, it became a platform for spammer to distribute spam through the comments on YouTube. This has become a concern because spam can lead to phishing attack which the target can be any user that click any malicious link. Spam has its own features that can be analyzed and detected by classification. Hence, enhancement features are proposed to detect YouTube spam. In order to conduct the experiments, a YouTube Spam detection framework that consists of five (5) phases such as data collection, pre-processing, features selection and extraction, classification and detection were developed. This paper, proposed the YouTube detection framework, examined and validate each of the phases by using two types of data mining tool. The features are constructed from analysis by using data collected from YouTube Spam dataset by using Naïve Bayes and Logistic Regression and tested in two different data mining tools which is Weka and Rapid Miner. From the analysis, thirteen (13) features that had been tested on Weka and RapidMiner shows high accuracy, hence is being used throughout the experiment in this research. Result of Naïve Bayes and Logistic Regression run in Weka is slightly higher than RapidMiner. In addition, result of Naïve Bayes is higher than Logistic Regression with 87.21% and 85.29% respectively in Weka. While in RapidMiner there is slightly different of accuracy between Naïve Bayes and Logistic Regression 80.41% and 80.88%. But, precision of Naïve Bayes is higher than Logistic Regression.

Publisher

Institute of Advanced Engineering and Science

Subject

Electrical and Electronic Engineering,Control and Optimization,Computer Networks and Communications,Hardware and Architecture,Information Systems,Signal Processing

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. CommentClass: A Robust Ensemble Machine Learning Model for Comment Classification;International Journal of Computational Intelligence Systems;2024-07-15

2. A Robust Ensemble Machine Learning Model with Advanced Voting Techniques for Comment Classification;Lecture Notes in Computer Science;2024

3. Ensemble Learning based Efficient Spam Detection of YouTube Comments;2023 6th International Conference on Advances in Science and Technology (ICAST);2023-12-08

4. Automated Spam Detection Using Sandpiper Optimization Algorithm-Based Feature Selection with the Machine Learning Model;IETE Journal of Research;2023-11-23

5. Youtube Spam Detection Scheme Using Stacked Ensemble Machine Learning Model;2023 International Conference on Network, Multimedia and Information Technology (NMITCON);2023-09-01