Abstract
Abstract
Let I be a finite set and S be a nonempty strict subset of I which is partitioned into classes, and let C(s) be the class containing s ∈ S. Let (Ps: s ∈ S) be a family of distributions on IN, where each Ps applies to sequences starting with the symbol s. To this family, we associate a class of distributions P(π) on IN which depends on a probability vector π. Our main results assume that, for each s ∈ S, Ps regenerates with distribution Ps' when it encounters s' ∈ S ∖ C(s). From semiregenerative theory, we determine a simple condition on π for P(π) to be time stationary. We give a similar result for the following more complex model. Once a symbol s' ∈ S ∖ C(s) has been encountered, there is a decision to be made: either a new region of type C(s') governed by Ps' starts or the region continues to be a C(s) region. This decision is modeled as a random event and its probability depends on s and s'. The aim in studying these kinds of models is to attain a deeper statistical understanding of bacterial DNA sequences. Here I is the set of codons and the classes (C(s): s ∈ S) identify codons that initiate similar genomic regions. In particular, there are two classes corresponding to the start and stop codons which delimit coding and noncoding regions in bacterial DNA sequences. In addition, the random decision to continue the current region or begin a new region of a different class reflects the well-known fact that not every appearance of a start codon marks the beginning of a new coding region.
Publisher
Cambridge University Press (CUP)
Subject
Statistics, Probability and Uncertainty,General Mathematics,Statistics and Probability