Affiliation:
1. Anhui Agricultural University, China
2. Hefei University of Technology, China
Abstract
Almost all existing causal feature selection methods are proposed without considering the problem of sample selection bias. However, in practice, as data-gathering process cannot be fully controlled, sample selection bias often occurs, leading to spurious correlations between features and the class variable, which seriously deteriorates the performance of those existing methods. In this article, we study the problem of causal feature selection under sample selection bias and propose a novel Progressive Causal Feature Selection (PCFS) algorithm which has three phases. First, PCFS learns the sample weights to balance the treated group and control group distributions corresponding to each feature for removing spurious correlations. Second, based on the sample weights, PCFS uses a weighted cross-entropy model to estimate the causal effect of each feature and removes some irrelevant features from the confounder set. Third, PCFS progressively repeats the first two phases to remove more irrelevant features and finally obtains a causal feature set. Using synthetic and real-world datasets, the experiments have validated the effectiveness of PCFS, in comparison with several state-of-the-art classical and causal feature selection methods.
Funder
National Key Research and Development Program of China
National Natural Science Foundation of China
University Synergy Innovation Program of Anhui Province
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Theoretical Computer Science
Reference35 articles.
1. Constantin F. Aliferis, Ioannis Tsamardinos, and Alexander R. Statnikov. 2003. HITON: A novel Markov blanket algorithm for optimal variable selection. In Proceedings of the American Medical Informatics Association Annual Symposium.
2. Susan Athey, Guido W. Imbens, and Stefan Wager. 2016. Efficient Inference of Average Treatment Effects in High Dimensions via Approximate Residual Balancing. Technical Report.
3. Giorgos Borboudakis and Ioannis Tsamardinos. 2019. Forward-backward selection with early dropping. Journal of Machine Learning Research 20 1 (2019) 276–314.
4. Time Series Domain Adaptation via Sparse Associative Structure Alignment
5. Ruichu Cai Jiahao Li Zhenjie Zhang Xiaoyan Yang and Zhifeng Hao. 2020. DACH: Domain adaptation without domain information. IEEE Transactions on Neural Networks and Learning Systems 31 12 (2020) 5055–5067.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献