Post-hoc Interpretability for Neural NLP: A Survey-Reference-Cited by-同舟云学术

Post-hoc Interpretability for Neural NLP: A Survey

Published:2022-12-23 Issue:8 Volume:55 Page:1-42
ISSN:0360-0300
Container-title:ACM Computing Surveys
language:en
Short-container-title:ACM Comput. Surv.

Author:

Madsen Andreas¹^ORCID,Reddy Siva²^ORCID,Chandar Sarath³^ORCID

Affiliation:

1. Mila & Polytechnic Montreal, Montréal, Quebec, Canada

2. Mila & McGill, Montréal, QC, Canada

3. Mila & Polytechnique Montreal, Montreal, Quebec, Canada

Abstract

Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for accountability. Interpretability serves to provide these explanations in terms that are understandable to humans. Additionally, post-hoc methods provide explanations after a model is learned and are generally model-agnostic. This survey provides a categorization of how recent post-hoc interpretability methods communicate explanations to humans, it discusses each method in-depth, and how they are validated, as the latter is often a common concern.

Funder

Canada CIFAR AI Chairs program

NSERC Discovery

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3546577

Reference138 articles.

1. Quantifying Attention Flow in Transformers

2. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)

3. Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. 2018. Sanity checks for saliency maps. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc., 9505–9515.

4. Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. In Proceedings of the International Conference on Learning Representations. 1–12.

5. A causal framework for explaining the predictions of black-box sequence-to-sequence models

Cited by 70 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cleanformer: A Confident Learning Based ERP Label Denoising Framework for Public Attitude Assessment to Recycled Water;Water Resources Management;2024-09-07

2. CAT: Interpretable Concept-based Taylor Additive Models;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24

3. A survey on advancements in image–text multimodal models: From general techniques to biomedical implementations;Computers in Biology and Medicine;2024-08

4. The Use of Machine Learning Methods in Political Science: An In-Depth Literature Review;Political Studies Review;2024-07-30

5. From outputs to insights: a survey of rationalization approaches for explainable text classification;Frontiers in Artificial Intelligence;2024-07-23