Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse-Reference-Cited by-同舟云学术

Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse

Published:2021-11-12 Issue:11 Volume:23 Page:1501
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Băncioiu Camil,Brad Remus^ORCID

Abstract

This article presents a novel and remarkably efficient method of computing the statistical G-test made possible by exploiting a connection with the fundamental elements of information theory: by writing the G statistic as a sum of joint entropy terms, its computation is decomposed into easily reusable partial results with no change in the resulting value. This method greatly improves the efficiency of applications that perform a series of G-tests on permutations of the same features, such as feature selection and causal inference applications because this decomposition allows for an intensive reuse of these partial results. The efficiency of this method is demonstrated by implementing it as part of an experiment involving IPC–MB, an efficient Markov blanket discovery algorithm, applicable both as a feature selection algorithm and as a causal inference method. The results show outstanding efficiency gains for IPC–MB when the G-test is computed with the proposed method, compared to the unoptimized G-test, but also when compared to IPC–MB++, a variant of IPC–MB which is enhanced with an AD–tree, both static and dynamic. Even if this proposed method of computing the G-test is presented here in the context of IPC–MB, it is in fact bound neither to IPC–MB in particular, nor to feature selection or causal inference applications in general, because this method targets the information-theoretic concept that underlies the G-test, namely conditional mutual information. This aspect grants it wide applicability in data sciences.

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/23/11/1501/pdf

Reference32 articles.

1. Causal inference in statistics: An overview

2. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference;Pearl,2008

3. Bayesian network induction via local neighborhoods;Margaritis,2000

4. Towards scalable and data efficient learning of Markov boundaries

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Analyzing Markov Boundary Discovery Algorithms in Ideal Conditions Using the d-Separation Criterion;Algorithms;2022-03-23