Greedy structure learning from data that contain systematic missing values-Reference-Cited by-同舟云学术

Greedy structure learning from data that contain systematic missing values

Published:2022-08-10 Issue:10 Volume:111 Page:3867-3896
ISSN:0885-6125
Container-title:Machine Learning
language:en
Short-container-title:Mach Learn

Author:

Liu Yang^ORCID,Constantinou Anthony C.

Abstract

AbstractLearning from data that contain missing values represents a common phenomenon in many domains. Relatively few Bayesian Network structure learning algorithms account for missing data, and those that do tend to rely on standard approaches that assume missing data are missing at random, such as the Expectation-Maximisation algorithm. Because missing data are often systematic, there is a need for more pragmatic methods that can effectively deal with data sets containing missing values not missing at random. The absence of approaches that deal with systematic missing data impedes the application of BN structure learning methods to real-world problems where missingness are not random. This paper describes three variants of greedy search structure learning that utilise pairwise deletion and inverse probability weighting to maximally leverage the observed data and to limit potential bias caused by missing values. The first two of the variants can be viewed as sub-versions of the third and best performing variant, but are important in their own in illustrating the successive improvements in learning accuracy. The empirical investigations show that the proposed approach outperforms the commonly used and state-of-the-art Structural EM algorithm, both in terms of learning accuracy and efficiency, as well as both when data are missing at random and not at random.

Funder

engineering and physical sciences research council

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Link

https://link.springer.com/content/pdf/10.1007/s10994-022-06195-8.pdf

Reference30 articles.

1. Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: what is it and how does it work? International Journal of Methods in Psychiatric Research, 20(1), 40–49.

2. Balov, N., et al. (2013). Consistent model selection of discrete Bayesian networks from incomplete data. Electronic Journal of Statistics, 7, 1047–1077.

3. Bodewes, T., & Scutari, M. (2021). Learning Bayesian networks from incomplete data with the node-average likelihood. International Journal of Approximate Reasoning, 138, 145–160.