Testing Closeness of Discrete Distributions


Batu Tuğkan1,Fortnow Lance2,Rubinfeld Ronitt3,Smith Warren D.4,White Patrick


1. London School of Economics and Political Science

2. Northwestern University

3. Massachusetts Institute of Technology and Tel Aviv University

4. Center for Range Voting


Given samples from two distributions over an n -element set, we wish to test whether these distributions are statistically close. We present an algorithm which uses sublinear in n , specifically, O ( n 2/3 ε −8/3 log n ), independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small (less than { ε 4/3 n −1/3 /32, εn −1/2 /4}) or large (more than ε ) in ℓ 1 distance. This result can be compared to the lower bound of Ω ( n 2/3 ε −2/3 ) for this problem given by Valiant [2008]. Our algorithm has applications to the problem of testing whether a given Markov process is rapidly mixing. We present sublinear algorithms for several variants of this problem as well.


Association for Computing Machinery (ACM)


Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software

Reference57 articles.

1. Aho A. V. Hopcroft J. E. and Ullman J. D. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley. Aho A. V. Hopcroft J. E. and Ullman J. D. 1974. The Design and Analysis of Computer Algorithms . Addison-Wesley.

2. Eigenvalues and expanders

3. Alon N. Matias Y. and Szegedy M. 1999b. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58. 10.1006/jcss.1997.1545 Alon N. Matias Y. and Szegedy M. 1999b. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58 . 10.1006/jcss.1997.1545

Cited by 45 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A fourth‐moment phenomenon for asymptotic normality of monochromatic subgraphs;Random Structures & Algorithms;2023-06-28

2. Evaluation of Categorical Generative Models - Bridging the Gap Between Real and Synthetic Data;ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2023-06-04

3. Generalized LRS Estimator for Min-Entropy Estimation;IEEE Transactions on Information Forensics and Security;2023

4. Quantum approximate counting for Markov chains and collision counting;Quantum Information and Computation;2022-11

5. Analysis of COVID-19 evolution based on testing closeness of sequential data;Japanese Journal of Statistics and Data Science;2022-01-29








Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3