Affiliation:
1. Hefei University of Technology, Hefei, China
2. University of Massachusetts Boston, Boston, USA
Abstract
Causal feature selection aims at learning the Markov blanket (MB) of a class variable for feature selection. The MB of a class variable implies the local causal structure among the class variable and its MB and all other features are probabilistically independent of the class variable conditioning on its MB, this enables causal feature selection to identify potential causal features for feature selection for building robust and physically meaningful prediction models. Missing data, ubiquitous in many real-world applications, remain an open research problem in causal feature selection due to its technical complexity. In this article, we discuss a novel multiple imputation MB (MimMB) framework for causal feature selection with missing data. MimMB integrates Data Imputation with MB Learning in a unified framework to enable the two key components to engage with each other. MB Learning enables Data Imputation in a potentially causal feature space for achieving accurate data imputation, while accurate Data Imputation helps MB Learning identify a reliable MB of the class variable in turn. Then, we further design an enhanced kNN estimator for imputing missing values and instantiate the MimMB. In our comprehensively experimental evaluation, our new approach can effectively learn the MB of a given variable in a Bayesian network and outperforms other rival algorithms using synthetic and real-world datasets.
Funder
National Key Research and Development Program of China
National Science Foundation of China
Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province
Publisher
Association for Computing Machinery (ACM)
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献