Abstract
AbstractUrn models for innovation capture fundamental empirical laws shared by several real-world processes. The so-called urn model with triggering includes, as particular cases, the urn representation of the two-parameter Poisson-Dirichlet process and the Dirichlet process, seminal in Bayesian non-parametric inference. In this work, we leverage this connection to introduce a general approach for quantifying closeness between symbolic sequences and test it within the framework of the authorship attribution problem. The method demonstrates high accuracy when compared to other related methods in different scenarios, featuring a substantial gain in computational efficiency and theoretical transparency. Beyond the practical convenience, this work demonstrates how the recently established connection between urn models and non-parametric Bayesian inference can pave the way for designing more efficient inference methods. In particular, the hybrid approach that we propose allows us to relax the exchangeability hypothesis, which can be particularly relevant for systems exhibiting complex correlation patterns and non-stationary dynamics.
Publisher
Springer Science and Business Media LLC
Reference68 articles.
1. Tria, F., Loreto, V., Servedio, V. & Strogatz, S. The dynamics of correlated novelties. Sci. Rep. 4, 1–8 (2014).
2. Heaps, H. S. Information Retrieval, Computational And Theoretical Aspects (Academic Press, 1978).
3. Taylor, L. Aggregation, variance and the mean. Nature 189, 732 (1961).
4. Gerlach, M. & Altmann, E. G. Scaling laws and fluctuations in the statistics of word frequencies. N. J. Phys. 16, 113010 (2014).
5. Tria, F., Loreto, V. & Servedio, V. Zipf’s, heaps’ and taylor’s laws are determined by the expansion into the adjacent possible. Entropy 20, 752 (2018).