Abstract
Abstract
Several natural and artificial structures are characterized by an intrinsic hierarchical organization. The present work describes a methodology for quantifying the degree of adherence between a given hierarchical template and a respective modular document (e.g. books or homepages with content organized into modules) organized as a respective content network. The original document, which in the case of the present work concerns Wikipedia pages, is transformed into a respective content network by first dividing the document into parts or modules. Then, the contents (words) of each pair of modules are compared in terms of the coincidence similarity index, yielding a respective weight. The adherence between the hierarchical template and the content network can then be measured by considering the coincidence similarity between the respective adjacency matrices, leading to the respective hierarchical adherence index. In order to provide additional information about this adherence, four specific indices are also proposed, quantifying the number of links between non-adjacent levels, links between nodes in the same level, converging links between adjacent levels, and missing links. The potential of the approach is illustrated respectively to model-theoretical networks as well as to real-world data obtained from Wikipedia. In addition to confirming the effectiveness of the suggested concepts and methods, the results suggest that real-world documents do not tend to substantially adhere to respective hierarchical templates.
Funder
Fundação de Amparo à Pesquisa do Estado de São Paulo
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Conselho Nacional de Desenvolvimento Científico e Tecnológico
Subject
Artificial Intelligence,Computer Networks and Communications,Computer Science Applications,Information Systems
Reference49 articles.
1. An ample approach to data and modeling;da F Costa,2021
2. Network coherence analysis on a family of nested weighted n-polygon networks;Liu;Fractals,2021
3. On the statistical analysis of single cell lineage trees;Stadler;J. Theor. Biol.,2018
4. Recovery of class hierarchies and compositionrelationships from machine code;Venkatesh,2014
5. Further generalizations of the jaccard index;da F Costa,2021