Error bounds on multivariate Normal approximations for word count statistics-Reference-Cited by-同舟云学术

Error bounds on multivariate Normal approximations for word count statistics

Published:2002-09 Issue:3 Volume:34 Page:559-586
ISSN:0001-8678
Container-title:Advances in Applied Probability
language:en
Short-container-title:Advances in Applied Probability

Author:

Huang Haiyan

Abstract

Given a sequence S and a collection Ω of d words, it is of interest in many applications to characterize the multivariate distribution of the vector of counts U = (N(S,w1), …, N(S,wd)), where N(S,w) is the number of times a word w ∈ Ω appears in the sequence S. We obtain an explicit bound on the error made when approximating the multivariate distribution of U by the normal distribution, when the underlying sequence is i.i.d. or first-order stationary Markov over a finite alphabet. When the limiting covariance matrix of U is nonsingular, the error bounds decay at rate O((log n) / √n) in the i.i.d. case and O((log n)3 / √n) in the Markov case. In order for U to have a nondegenerate covariance matrix, it is necessary and sufficient that the counted word set Ω is not full, that is, that Ω is not the collection of all possible words of some length k over the given finite alphabet. To supply the bounds on the error, we use a version of Stein's method.

Publisher

Cambridge University Press (CUP)

Subject

Applied Mathematics,Statistics and Probability

Reference16 articles.

1. On coupling constructions and rates in the CLT for dependent summands with applications to the antivector model and weighted U-statistics;Rinott;Ann. Appl. Prob.,1997

2. Compound Poisson and Poisson Process Approximations for Occurrences of Multiple Words in Markov Chains

3. Finding words with unexpected frequencies in deoxyribonucleic acid sequences;Prum;J. R. Statist. Soc. B,1995

4. Probabilistic and Statistical Properties of Words: An Overview

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A New Context Tree Inference Algorithm for Variable Length Markov Chain Model with Applications to Biological Sequence Analyses;Journal of Computational Biology;2022-08-01

2. Normal and Compound Poisson Approximations for Pattern Occurrences in NGS Reads;Journal of Computational Biology;2012-06

3. The Power of Detecting Enriched Patterns: An HMM Approach;Journal of Computational Biology;2010-04

4. Alignment-Free Sequence Comparison (I): Statistics and Power;Journal of Computational Biology;2009-12

5. Berry-Esseen bounds for combinatorial central limit theorems and pattern occurrences, using zero and size biasing;Journal of Applied Probability;2005-09