Affiliation:
1. University of Maryland, College Park, USA
2. Institute of Science and Technology, Austria
Abstract
A central problem in computational statistics is to convert a procedure for
sampling
combinatorial objects into a procedure for
counting
those objects, and vice versa. We consider sampling problems coming from
Gibbs distributions
, which are families of probability distributions over a discrete space
\(\Omega\)
with probability mass function of the form
\(\mu^{\Omega}_{\beta}(\omega)\propto e^{\beta H(\omega)}\)
for
\(\beta\)
in an interval
\([\beta_{\min},\beta_{\max}]\)
and
\(H(\omega)\in\{0\}\cup[1,n]\)
. Two important parameters are the
partition function
, which is the normalization factor
\(Z(\beta)=\sum_{\omega\in\Omega}e^{\beta H(\omega)}\)
, and the vector of preimage
counts
\(c_{x}=|H^{-1}(x)|\)
.
We develop black-box sampling algorithms to estimate the counts using roughly
\(\tilde{O}(\frac{n^{2}}{\varepsilon^{2}})\)
samples for integer-valued distributions and
\(\tilde{O}(\frac{q}{\varepsilon^{2}})\)
samples for general distributions, where
\(q=\log\frac{Z(\beta_{\max})}{Z(\beta_{\min})}\)
(ignoring some second-order terms and parameters). We show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs, independent sets, and perfect matchings.
As a key subroutine, we estimate all values of the partition function using
\(\tilde{O}(\frac{n^{2}}{\varepsilon^{2}})\)
samples for integer-valued distributions and
\(\tilde{O}(\frac{q}{\varepsilon^{2}})\)
samples for general distributions. This improves over a prior algorithm of Huber (2015) which computes a single point estimate
\(Z(\beta_{\max})\)
and which uses a slightly larger amount of samples. We show matching lower bounds, demonstrating this complexity is optimal as a function of
\(n\)
and
\(q\)
up to logarithmic terms.
Publisher
Association for Computing Machinery (ACM)