1. Amari, S., Ozeki, T., & Park, H-Y. (2003). Learning and inference in hierarchical models with singularities. Systems and Computers in Japan, 34(7), 34–42.
2. Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41(1), 164–171.
3. Beal, M. J. (2003). Variational algorithms for approximate Bayesian inference. PhD. Thesis, Gatsby Computational Neuroscience Unit, University College London.
4. Beal, M. J., Ghahramani, Z., & Rasmussen, C. E. (2002). The infinite hidden Markov model. Advances in neural information processing systems (Vol. 14). Cambridge, MA: MIT Press.
5. Bellman, R. (1957). Dynamic programming. Boston: Princeton University Press.