Abstract
Abstract
High order discrete Markov chain is essential to analyze the dependency structure of data sets. To apply Markov chain correctly, even though the true order is an unknown parameter, statisticians have developed multiple order estimators. It is natural to identify the strongest order estimators under different parameter combinations. Aim for evaluating the performance of estimators, we study four of them in this paper: Akaike information criteria (AIC), Bayesian information criteria (BIC), Maximal fluctuation estimation method (PS), and approximate χ
2 − distribution method (Dk
). We simulated Cr
× C transition matrices to generate word-count-based Markov sequences with the most straightforward initial distribution. We found PS and Dk
give more accurate discrete Markov order estimation. Although AIC and BIC are commonly applied, their performances are not the most accurate. The accuracy declines approximately exponentially as the Markov model gets more complex, i.e. r ≥ 1 and C ≥ 3. AIC’s accuracy is higher when the Markov chain length is relatively small, but Dk
yields a slightly higher accuracy under the same setting. PS give a more reasonable estimation when Markov order is the variable, i.e. 1 ≥ r ≥ 3. Dk
gives more reasonable estimations when the length L and alphabet size C are variable, i.e. 150 ≥ L ≥ 800 and 3 ≥ C ≥ 5.
Subject
General Physics and Astronomy