Counterfactual Supervision-Based Information Bottleneck for Out-of-Distribution Generalization-Reference-Cited by-同舟云学术

Counterfactual Supervision-Based Information Bottleneck for Out-of-Distribution Generalization

Published:2023-01-18 Issue:2 Volume:25 Page:193
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Deng Bin^ORCID,Jia Kui^ORCID

Abstract

Learning invariant (causal) features for out-of-distribution (OOD) generalization have attracted extensive attention recently, and among the proposals, invariant risk minimization (IRM) is a notable solution. In spite of its theoretical promise for linear regression, the challenges of using IRM in linear classification problems remain. By introducing the information bottleneck (IB) principle into the learning of IRM, the IB-IRM approach has demonstrated its power to solve these challenges. In this paper, we further improve IB-IRM from two aspects. First, we show that the key assumption of support overlap of invariant features used in IB-IRM guarantees OOD generalization, and it is still possible to achieve the optimal solution without this assumption. Second, we illustrate two failure modes where IB-IRM (and IRM) could fail in learning the invariant features, and to address such failures, we propose a Counterfactual Supervision-based Information Bottleneck (CSIB) learning algorithm that recovers the invariant features. By requiring counterfactual inference, CSIB works even when accessing data from a single environment. Empirical experiments on several datasets verify our theoretical results.

Funder

Guangdong R&D key project of China

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/25/2/193/pdf

Reference58 articles.

1. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. (2013). Intriguing properties of neural networks. arXiv.

2. Rosenfeld, A., Zemel, R., and Tsotsos, J.K. (2018). The elephant in the room. arXiv.

3. Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., and Brendel, W. (2019, January 6–9). ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA.

4. Nguyen, A., Yosinski, J., and Clune, J. (2015, January 7–12). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. Proceedings of the Computer Vision and Pattern Recognition Conference, Boston, MA, USA.

5. Gururangan, S., Swayamdipta, S., Levy, O., Schwartz, R., Bowman, S.R., and Smith, N.A. (2018). Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), Association for Computational Linguistics.