Multilayer networks for text analysis with multiple data types-Reference-Cited by-同舟云学术

Multilayer networks for text analysis with multiple data types

Published:2021-06-28 Issue:1 Volume:10 Page:
ISSN:2193-1127
Container-title:EPJ Data Science
language:en
Short-container-title:EPJ Data Sci.

Author:

Hyland Charles C.,Tao Yuanming,Azizi Lamiae,Gerlach Martin,Peixoto Tiago P.,Altmann Eduardo G.^ORCID

Abstract

AbstractWe are interested in the widespread problem of clustering documents and finding topics in large collections of written documents in the presence of metadata and hyperlinks. To tackle the challenge of accounting for these different types of datasets, we propose a novel framework based on Multilayer Networks and Stochastic Block Models. The main innovation of our approach over other techniques is that it applies the same non-parametric probabilistic framework to the different sources of datasets simultaneously. The key difference to other multilayer complex networks is the strong unbalance between the layers, with the average degree of different node types scaling differently with system size. We show that the latter observation is due to generic properties of text, such as Heaps’ law, and strongly affects the inference of communities. We present and discuss the performance of our method in different datasets (hundreds of Wikipedia documents, thousands of scientific papers, and thousands of E-mails) showing that taking into account multiple types of information provides a more nuanced view on topic- and document-clusters and increases the ability to predict missing links.

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Computer Science Applications,Modeling and Simulation

Link

https://link.springer.com/content/pdf/10.1140/epjds/s13688-021-00288-5.pdf

Reference46 articles.

1. Kedem B, De Oliveira V, Sverchkov M (2017) Statistical data fusion. World Scientific, Singapore

2. Costanedo F (2013) A review of data fusion techniques. Sci World J 2013:704504

3. Zhu Y, Yan X, Getoor L, Moore C (2013) Scalable text and link analysis with mixed-topic link models. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 473–481

4. Kivelä M, Arenas A, Barthelemy M, Gleeson J, Moreno Y, Porter M (2014) Multilayer networks. J Complex Netw 2(3):203–271

5. Zanin M, Papo D, Sousa PA, Menasalvas E, Nicchi A, Kubik E, Boccaletti S (2016) Combining complex networks and data mining: why and how. Phys Rep 635:1–44

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data Collection and Analysis for Small-Town Policing: Challenges and Recommendations;Statistics and Public Policy;2024-08-29

2. Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach;Cancers;2024-03-29

3. The concept of decentralization through time and disciplines: a quantitative exploration;EPJ Data Science;2023-10-03

4. The dynamic resilience of urban labour networks;Royal Society Open Science;2023-07

5. Implementation of Data Science Techniques in the ACM Computing Classification System;2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME);2022-11-16