Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP

Author:

Schulte LukasORCID,Ledel Benjamin,Herbold SteffenORCID

Abstract

Abstract Context The identification of bugs within issues reported to an issue tracking system is crucial for triage. Machine learning models have shown promising results for this task. However, we have only limited knowledge of how such models identify bugs. Explainable AI methods like LIME and SHAP can be used to increase this knowledge. Objective We want to understand if explainable AI provides explanations that are reasonable to us as humans and align with our assumptions about the model’s decision-making. We also want to know if the quality of predictions is correlated with the quality of explanations. Methods We conduct a study where we rate LIME and SHAP explanations based on their quality of explaining the outcome of an issue type prediction model. For this, we rate the quality of the explanations, i.e., if they align with our expectations and help us understand the underlying machine learning model. Results We found that both LIME and SHAP give reasonable explanations and that correct predictions are well explained. Further, we found that SHAP outperforms LIME due to a lower ambiguity and a higher contextuality that can be attributed to the ability of the deep SHAP variant to capture sentence fragments. Conclusion We conclude that the model finds explainable signals for both bugs and non-bugs. Also, we recommend that research dealing with the quality of explanations for classification tasks reports and investigates rater agreement, since the rating of explanations is highly subjective.

Funder

Universität Passau

Publisher

Springer Science and Business Media LLC

Reference42 articles.

1. Alvarez-Melis D, Jaakkola TS (2018) On the robustness of interpretability methods. 1806.08049

2. Biran O, Cotton C (2017) Explanation and justification in machine learning: a survey. In: IJCAI-17 workshop on explainable AI (XAI), vol 8. pp 8–13

3. Cohen J (1988) Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates

4. Cook TD, Campbell DT, Day A (1979) Quasi-experimentation: Design & analysis issues for field settings, vol 351. Houghton Mifflin Boston

5. Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16(3):297–334

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3