Abstract
AbstractAlthough deep learning has become state of the art for numerous tasks, it remains untouched for many specialized domains. High stake environments such as medical settings pose more challenges due to trust and safety issues for deep learning algorithms. In this work, we propose to address these issues by evaluating the performance and explanability of a Bidirectional Encoder Representations from Transformers (BERT) model for the task of medical image protocol assignment. Specifically, we evaluate the performance and explainability on this medical image protocol classification task by fine tuning a pre-trained BERT model and measuring the word importance by attributing the classification output to every word through a gradient based method. We then have a trained radiologist review the resulting word importance scores and assess the validity of the model’s decision-making process in comparison to that of a human. Our results indicate that the BERT model is able to identify relevant words that are highly indicative of the target protocol. Furthermore, through the analysis of important words in misclassifications, we are able to reveal potential systematic errors in the model that may be addressed to improve its safety and suitability for use in a clinical setting.
Publisher
Cold Spring Harbor Laboratory
Reference28 articles.
1. 2019. Explainable AI: the basics policy brief. https://royalsociety.org/-/media/policy/projects/explainable-ai/985AI-and-interpretability-policy-briefing.pdf
2. A causal frame-work for explaining the predictions of black-box sequence-to-sequence models;arXiv preprint,2017
3. Effectiveness of Clinical Decision Support in Controlling Inappropriate Imaging
4. Protocol design and optimization;Journal of the American College of Radiology,2014
5. A Natural Language Processing-based Model to Automate MRI Brain Protocol Selection and Prioritization