Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction-Reference-Cited by-同舟云学术

Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction

Published:1999-07-01 Issue:5 Volume:11 Page:1155-1182
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Brand Matthew¹

Affiliation:

1. Mitsubishi Electric Research Labs, Cambridge Research Center, Cambridge, MA 02139, U.S.A.

Abstract

We introduce an entropic prior for multinomial parameter estimation problems and solve for its maximum a posteriori (MAP) estimator. The prior is a bias for maximally structured and minimally ambiguous models. In conditional probability models with hidden state, iterative MAP estimation drives weakly supported parameters toward extinction, effectively turning them off. Thus, structure discovery is folded into parameter estimation. We then establish criteria for simplifying a probabilistic model's graphical structure by trimming parameters and states, with a guarantee that any such deletion will increase the posterior probability of the model. Trimming accelerates learning by sparsifying the model. All operations monotonically and maximally increase the posterior probability, yielding structure-learning algorithms only slightly slower than parameter estimation via expectation-maximization and orders of magnitude faster than search-based structure induction. When applied to hidden Markov model training, the resulting models show superior generalization to held-out test data. In many cases the resulting models are so sparse and concise that they are interpretable, with hidden states that strongly correlate with meaningful categories.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/089976699300016395

Reference15 articles.

1. Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions

2. On the LambertW function

3. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms

4. Gaussian limiting distributions for the number of components in combinatorial structures

Cited by 81 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Combining learning and control in linear systems;European Journal of Control;2024-06

2. Separation of learning and control for cyber–physical systems;Automatica;2023-05

3. Penalized proportion estimation for non parametric mixture of regressions;Communications in Statistics - Theory and Methods;2020-01-29

4. Subject Index;Model-Based Clustering and Classification for Data Science;2019-07-31

5. Author Index;Model-Based Clustering and Classification for Data Science;2019-07-31