Affiliation:
1. Department of Industrial Engineering, Tel Aviv University , Israel , 6997801
Abstract
Abstract
Consider a finite sample from an unknown multinomial distribution. Inferring the underlying multinomial parameters is a basic problem in statistics and related fields. Currently known methods focus on classical regimes where the sample is large, or both the sample and the alphabet are small. In this work we study the complementary large alphabet regime, as we consider the case where the number of samples is comparable with (or even smaller than) the alphabet size. We introduce a novel inference scheme that significantly improves upon currently known methods. Our proposed scheme is robust, easy to apply and provides favourable performance guarantees.
Funder
Israel Science Foundation
Publisher
Oxford University Press (OUP)
Subject
Applied Mathematics,Computational Theory and Mathematics,Numerical Analysis,Statistics and Probability,Analysis
Reference43 articles.
1. Improved bounds for minimax risk of estimating missing mass;Acharya,2018
2. Confidence regions for the multinomial parameter with small sample size;Chafai;J. Am. Stat. Assoc.,2009
3. An empirical study of smoothing techniques for language modeling;Chen;Comput. Speech Lang.,1999