Affiliation:
1. Université Grenoble Alpes, CNRS, Grenoble, France
Abstract
In many of the large-scale physical and social complex systems phenomena fat-tailed distributions occur, for which different generating mechanisms have been proposed. In this paper, we study models of generating power law distributions in the evolution of large-scale taxonomies such as Open Directory Project, which consist of websites assigned to one of tens of thousands of categories. The categories in such taxonomies are arranged in tree or DAG structured configurations having parent-child relations among them. We first quantitatively analyse the formation process of such taxonomies, which leads to power law distribution as the stationary distributions. In the context of designing classifiers for large-scale taxonomies, which automatically assign unseen documents to leaf-level categories, we highlight how the fat-tailed nature of these distributions can be leveraged to analytically study the space complexity of such classifiers. Empirical evaluation of the space complexity on publicly available datasets demonstrates the applicability of our approach.
Publisher
Association for Computing Machinery (ACM)
Reference33 articles.
1. A.-L. Barabási and R. Albert. Emergence of scaling in random networks. science 286(5439):509--512 1999. A.-L. Barabási and R. Albert. Emergence of scaling in random networks. science 286(5439):509--512 1999.
2. Refined experts
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献