Affiliation:
1. Department of Electrical Engineering and Computer Science, University of Arkansas, Fayetteville, AR 72701, USA
Abstract
Investigating causality to establish novel criteria for training robust natural language processing (NLP) models is an active research area. However, current methods face various challenges such as the difficulties in identifying keyword lexicons and obtaining data from multiple labeled environments. In this paper, we study the problem of robust NLP from a complementary but different angle: we treat the behavior of an attack model as a complex causal mechanism and quantify its algorithmic information using the minimum description length (MDL) framework. Specifically, we use masked language modeling (MLM) to measure the “amount of effort” needed to transform from the original text to the altered text. Based on that, we develop techniques for judging whether a specified set of tokens has been altered by the attack, even in the absence of the original text data.
Reference43 articles.
1. Du, M., Manjunatha, V., Jain, R., Deshpande, R., Dernoncourt, F., Gu, J., Sun, T., and Hu, X. (2021, January 6–11). Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online.
2. Utama, P.A., Moosavi, N.S., and Gurevych, I. (2020, January 16–20). Towards Debiasing NLU Models from Unknown Biases. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online.
3. Niven, T., and Kao, H.Y. (August, January 28). Probing Neural Network Comprehension of Natural Language Arguments. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy.
4. Wang, X., Wang, H., and Yang, D. (2021). Measure and improve robustness in nlp models: A survey. arXiv.
5. Gururangan, S., Swayamdipta, S., Levy, O., Schwartz, R., Bowman, S.R., and Smith, N.A. (2018). Annotation artifacts in natural language inference data. arXiv.