Affiliation:
1. University of Pennsylvania, Department of Computer and Information Science. lyuqing@sas.upenn.edu
2. University of Pennsylvania, Department of Computer and Information Science. marapi@seas.upenn.edu
3. University of Pennsylvania, Department of Computer and Information Science. ccb@seas.upenn.edu
Abstract
Abstract
End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand. This has given rise to numerous efforts towards model explainability in recent years. One desideratum of model explanation is faithfulness, that is, an explanation should accurately represent the reasoning process behind the model’s prediction. In this survey, we review over 110 model explanation methods in NLP through the lens of faithfulness. We first discuss the definition and evaluation of faithfulness, as well as its significance for explainability. We then introduce recent advances in faithful explanation, grouping existing approaches into five categories: similarity-based methods, analysis of model-internal structures, backpropagation-based methods, counterfactual intervention, and self-explanatory models. For each category, we synthesize its representative studies, strengths, and weaknesses. Finally, we summarize their common virtues and remaining challenges, and reflect on future work directions towards faithful explainability in NLP.
Reference248 articles.
1. Quantifying attention flow in
transformers;Abnar,2020
2. CEBaB: Estimating the causal effects of
real-world concepts on NLP model behavior;Abraham;Advances in Neural Information Processing Systems,2022
3. Sanity checks for saliency maps;Adebayo,2018
4. Debugging tests for model explanations;Adebayo;Advances in Neural Information Processing Systems,2020
5. Fine-grained analysis of sentence
embeddings using auxiliary prediction tasks;Adi,2017
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献