Uncovering Active Communities from Directed Graphs on Distributed Spark Frameworks, Case Study: Twitter Data-Reference-Cited by-同舟云学术

Uncovering Active Communities from Directed Graphs on Distributed Spark Frameworks, Case Study: Twitter Data

Published:2021-09-22 Issue:4 Volume:5 Page:46
ISSN:2504-2289
Container-title:Big Data and Cognitive Computing
language:en
Short-container-title:BDCC

Author:

Moertini Veronica S.,Adithia Mariskha T.

Abstract

Directed graphs can be prepared from big data containing peoples’ interaction information. In these graphs the vertices represent people, while the directed edges denote the interactions among them. The number of interactions at certain intervals can be included as the edges’ attribute. Thus, the larger the count, the more frequent the people (vertices) interact with each other. Subgraphs which have a count larger than a threshold value can be created from these graphs, and temporal active communities can then be mined from each of these subgraphs. Apache Spark has been recognized as a data processing framework that is fast and scalable for processing big data. It provides DataFrames, GraphFrames, and GraphX APIs which can be employed for analyzing big graphs. We propose three kinds of active communities, namely, Similar interest communities (SIC), Strong-interacting communities (SC), and Strong-interacting communities with their “inner circle” neighbors (SCIC), along with algorithms needed to uncover them. The algorithm design and implementation are based on these APIs. We conducted experiments on a Spark cluster using ten machines. The results show that our proposed algorithms are able to uncover active communities from public big graphs as well from Twitter data collected using Spark structured streaming. In some cases, the execution time of the algorithms that are based on GraphFrames’ motif findings is faster.

Funder

Direktorat Riset dan Pengabdian Masyarakat, Direktorat Jenderal Penguatan Riset dan Pengembangan, Kemenristekdikti, Indonesia

Publisher

MDPI AG

Subject

Artificial Intelligence,Computer Science Applications,Information Systems,Management Information Systems

Link

https://www.mdpi.com/2504-2289/5/4/46/pdf

Reference30 articles.

1. Scalable and Efficient Flow-Based Community Detection for Large-Scale Graph Analysis

2. Community detection in graphs;Fortunato,2010

3. Stacked Community Prediction: A Distributed Stacking-Based Community Extraction Methodology for Large Scale Social Networks

4. Spark GraphX in Action;Malak,2016

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. BDPS: An Efficient Spark-Based Big Data Processing Scheme for Cloud Fog-IoT Orchestration;Information;2021-12-10