Affiliation:
1. Brown Univ., Providence, RI
Abstract
We introduce fast algorithms for selecting a random sample of
n
records without replacement from a pool of
N
records, where the value of
N
is unknown beforehand. The main result of the paper is the design and analysis of Algorithm Z; it does the sampling in one pass using constant space and in
O
(
n
(1 + log(
N/n
))) expected time, which is optimum, up to a constant factor. Several optimizations are studied that collectively improve the speed of the naive version of the algorithm by an order of magnitude. We give an efficient Pascal-like implementation that incorporates these modifications and that is suitable for general use. Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin.
Publisher
Association for Computing Machinery (ACM)
Subject
Applied Mathematics,Software
Cited by
947 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献