A Performance Comparison of Unsupervised Techniques for Event Detection from Oscar Tweets

Author:

Malik Muzamil1ORCID,Aslam Waqar1ORCID,Aslam Zahid1,Alharbi Abdullah2,Alouffi Bader3ORCID,Rauf Hafiz Tayyab4

Affiliation:

1. Department of Computer Science & Information Technology, Islamia University of Bahawalpur, Bahawalpur, Pakistan

2. Department of Information Technology, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia

3. Department of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia

4. Centre for Smart Systems, AI and Cybersecurity, Staffordshire University, Stoke-on-Trent, UK

Abstract

People’s lives are influenced by social media. It is an essential source for sharing news, awareness, detecting events, people’s interests, etc. Social media covers a wide range of topics and events to be discussed. Extensive work has been published to capture the interesting events and insights from datasets. Many techniques are presented to detect events from social media networks like Twitter. In text mining, most of the work is done on a specific dataset, and there is the need to present some new datasets to analyse the performance and generic nature of Topic Detection and Tracking methods. Therefore, this paper publishes a dataset of real-life event, the Oscars 2018, gathered from Twitter and makes a comparison of soft frequent pattern mining (SFPM), singular value decomposition and k-means (K-SVD), feature-pivot (Feat-p), document-pivot (Doc-p), and latent Dirichlet allocation (LDA). The dataset contains 2,160,738 tweets collected using some seed words. Only English tweets are considered. All of the methods applied in this paper are unsupervised. This area needs to be explored on different datasets. The Oscars 2018 is evaluated using keyword precision (K-Prec), keyword recall (K-Rec), and topic recall (T-Rec) for detecting events of greater interest. The highest K-Prec, K-Rec, and T-Rec were achieved by SFPM, but they started to decrease as the number of clusters increased. The lowest performance was achieved by Feat-p in terms of all three metrics. Experiments on the Oscars 2018 dataset demonstrated that all the methods are generic in nature and produce meaningful clusters.

Funder

Taif University

Publisher

Hindawi Limited

Subject

General Mathematics,General Medicine,General Neuroscience,General Computer Science

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3