Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data

Author:

Mandviwalla Aamir12,Elsisy Amr12ORCID,Atique Muhammad Saad12,Kuzmin Konstantin12,Gaiteri Chris34,Szymanski Boleslaw K.12ORCID

Affiliation:

1. Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, USA

2. Network Science and Technology Center, Rensselaer Polytechnic Institute, Troy, NY 12180, USA

3. Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL 60612, USA

4. Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA

Abstract

Mapping network nodes and edges to communities and network functions is crucial to gaining a higher level of understanding of the network structure and functions. Such mappings are particularly challenging to design for covert social networks, which intentionally hide their structure and functions to protect important members from attacks or arrests. Here, we focus on correctly inferring the structures and functions of such networks, but our methodology can be broadly applied. Without the ground truth, knowledge about the allocation of nodes to communities and network functions, no single network based on the noisy data can represent all plausible communities and functions of the true underlying network. To address this limitation, we apply a generative model that randomly distorts the original network based on the noisy data, generating a pool of statistically equivalent networks. Each unique generated network is recorded, while each duplicate of the already recorded network just increases the repetition count of that network. We treat each such network as a variant of the ground truth with the probability of arising in the real world approximated by the ratio of the count of this network’s duplicates plus one to the total number of all generated networks. Communities of variants with frequently occurring duplicates contain persistent patterns shared by their structures. Using Shannon entropy, we can find a variant that minimizes the uncertainty for operations planned on the network. Repeatedly generating new pools of networks from the best network of the previous step for several steps lowers the entropy of the best new variant. If the entropy is too high, the network operators can identify nodes, the monitoring of which can achieve the most significant reduction in entropy. Finally, we also present a heuristic for constructing a new variant, which is not randomly generated but has the lowest expected cost of operating on the distorted mappings of network nodes to communities and functions caused by noisy data.

Funder

U.S. Department of Homeland Security

Defense Advanced Research Projects Agency

Publisher

MDPI AG

Subject

General Physics and Astronomy

Reference27 articles.

1. Statista (2023, June 11). Worldwide Data Created. Available online: http://www.statista.com/statistics/871513/worldwide-data-created/.htm.

2. Polo, S.M., and Welsh, B. (2022). Oxford Research Encyclopedia of International Studies, Oxford University Press.

3. A Bayesian networks approach for predicting protein-protein interactions from genomic data;Jansen;Science,2003

4. Bahulkar, A., Szymanski, B.K., Baycik, N.O., and Sharkey, T.C. (2018, January 28–31). Community detection with edge augmentation in criminal networks. Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona, Spain.

5. Communities in criminal networks: A case study;Calderoni;Soc. Netw.,2017

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3