Abstract
The goal of this paper is to investigate properties of statistical procedures based on numbers of different patterns by using generating functions for the probabilities of a prescribed number of occurrences of given patterns in a random text. The asymptotic formulae are derived for the expected value of the number of words occurring a given number of times and for the covariance matrix. The form of the optimal linear test based on these statistics is established. These problems appear in testing for the randomness of a string of binary bits, DNA sequencing, source coding, synchronization, quality control protocols, etc. Indeed, the probabilities of repeated (overlapping) patterns are important in information theory (the second-order properties of relative frequencies)
and molecular biology problems (finding patterns with unexpectedly low or high frequencies).
Publisher
Cambridge University Press (CUP)
Subject
Applied Mathematics,Statistics and Probability