Affiliation:
1. University of Toronto, Toronto, ON, Canada
2. Indian Statistical Institute, Kolkata, India
3. National University of Singapore, Singapore, Singapore
Abstract
Given a non-deterministic finite automaton (NFA) A with m states, and a natural number n (presented in unary), the #NFA problem asks to determine the size of the set L(A,n) of words of length n accepted by A. While the corresponding decision problem of checking the emptiness of L(A,n) is solvable in polynomial time, the #NFA problem is known to be #P-hard. Recently, the long-standing open question --- whether there is an FPRAS (fully polynomial time randomized approximation scheme) for #NFA --- was resolved by Arenas, Croquevielle, Jayaram, and Riveros in [ACJR19]. The authors demonstrated the existence of a fully polynomial randomized approximation scheme with a time complexity of ~O(m
17
n
17
• 1/ε
14
• log (1/δ)), for a given tolerance ε and confidence parameter δ.
Given the prohibitively high time complexity in terms of each of the input parameters, and considering the widespread application of approximate counting (and sampling) in various tasks in Computer Science, a natural question arises: is there a faster FPRAS for #NFA that can pave the way for the practical implementation of approximate #NFA tools? In this work, we answer this question in the positive. We demonstrate that significant improvements in time complexity are achievable, and propose an FPRAS for #NFA that is more efficient in terms of both time and sample complexity.
A key ingredient in the FPRAS due to Arenas, Croquevielle, Jayaram, and Riveros [ACJR19] is inter-reducibility of sampling and counting, which necessitates a closer look at the more informative measure --- the number of samples maintained for each pair of state q and length i <= n. In particular, the scheme of [ACJR19] maintains O(m
7
/n
7
ε
7
) samples per pair of state and length. In the FPRAS we propose, we systematically reduce the number of samples required for each state to be only poly-logarithmically dependent on m, with significantly less dependence on n and ε, maintaining only ~O(n
4
/ε
2
) samples per state. Consequently, our FPRAS runs in time ~O((m
2
n
10
+ m
3
n
6
) • 1/ε
4
• log
2
(1/δ)). The FPRAS and its analysis use several novel insights. First, our FPRAS maintains a weaker invariant about the quality of the estimate of the number of samples for each state q and length i <= n. Second, our FPRAS only requires that the distribution of the samples maintained is close to uniform distribution only in total variation distance (instead of maximum norm). We believe our insights may lead to further reductions in time complexity and thus open up a promising avenue for future work towards the practical implementation of tools for approximate #NFA.
Funder
Ministry of Education - Singapore
Publisher
Association for Computing Machinery (ACM)