Abstract
AbstractBackgroundDeep neural networks (DNNs) show excellent performance in interpreting electrocardiograms (ECGs), both for conventional ECG interpretation and for novel applications such as detection of reduced ejection fraction and prediction of one-year mortality. Despite these promising developments, clinical implementation is severely hampered by the lack of trustworthy techniques to explain the decisions of the algorithm to clinicians. Especially, currently employed heatmap-based methods have shown to be inaccurate.MethodsWe present a novel approach that is inherently explainable and uses an unsupervised variational auto-encoder (VAE) to learn the underlying factors of variation of the ECG (the FactorECG) in a database with 1.1 million ECG recordings. These factors are subsequently used in a pipeline with common and interpretable statistical methods. As the ECG factors are explainable by generating and visualizing ECGs on both the model- and individual patient-level, the pipeline becomes fully explainable. The performance of the pipeline is compared to a state-of-the-art ‘black box’ DNN in three tasks: conventional ECG interpretation with 35 diagnostic statements, detection of reduced ejection fraction and prediction of one-year mortality.FindingsThe VAE was able to compress the ECG into 21 generative ECG factors, which are associated with physiologically valid underlying anatomical and (patho)physiological processes. When applying the novel pipeline to the three tasks, the explainable FactorECG pipeline performed similar to state-of-the-art ‘black box’ DNNs in conventional ECG interpretation (AUROC 0·94 vs 0·96), detection of reduced ejection fraction (AUROC 0·90 vs 0·91) and prediction of one-year mortality (AUROC 0·76 vs 0·75). Contrary to state-of-the-art, our pipeline provided inherent explainability on which morphological ECG features were important for prediction or diagnosis.InterpretationFuture studies should employ DNNs that are inherently explainable to facilitate clinical implementation by gaining confidence in artificial intelligence, and more importantly, making it possible to identify biased or inaccurate models.FundingThis study was financed by the Netherlands Organisation for Health Research and Development (ZonMw, no. 104021004) and the Dutch Heart Foundation (no. 2019B011).Research into ContextEvidence before this studyA comprehensive literature survey was performed for research articles on interpretable or explainable artificial intelligence (AI) for interpretation of raw electrocardiograms (ECGs) using PubMed and Google Scholar databases. Articles in English up to November 24, 2021, were included and the following key words were used: deep neural network (DNN), deep learning, convolutional neural network, artificial intelligence, electrocardiogram, ECG, explainability, explainable, interpretability, interpretable, and visualization. Many studies that used DNNs to interpret ECGs with high predictive performances were found, some focusing on tasks known to be associated with the ECG (e.g., rhythm disorders) and others identifying completely novel use cases for the ECG (e.g. reduced ejection fraction). All of these studies employed post-hoc explainability techniques, where the decisions of the ‘black box’ DNN were visualized after training, usually using heatmaps (i.e., using Grad-CAM, SHAP or LIME). In these studies, only some example ECGs were handpicked, as these heatmap-based techniques only work on single ECGs. Three studies also investigated the global features of the model by taking a summary measure of the heatmaps, by relating heatmaps to known ECG parameters (i.e., QRS duration) or by using prototypes. No studies investigated whether the features found using heatmaps were robust or reproducible.Added value of this studyCurrently employed post-hoc explainability techniques, usually heatmap-based, have limited explainable value as they merely indicate the temporal location of a specific feature in the individual ECG. Moreover, these techniques have been shown to be unreliable, poorly reproducible and suffer from confirmation bias. To address this gap in knowledge, we designed a DNN that is inherently explainable (i.e. explainable by design instead of investigating post-hoc). This DNN is used in a pipeline that consists of three components: (i) a generative DNN (variational auto-encoder) that learned to encode the ECG into its underlying 21 continuous factors of variation (the FactorECG), (ii) a visualization technique to provide insight into these ECG factors, and (iii) a common interpretable statistical method to perform diagnosis or prediction using the ECG factors. Model-level explainability is obtained by varying the ECG factors while generating and plotting ECGs, which allows for visualization of detailed changes in morphology, that are associated with physiologically valid underlying anatomical and (patho)physiological processes. Moreover, individual patient-level explanations are also possible, as every individual ECG has its representative set of explainable FactorECG values, of which the associations with the outcome are known. When using the explainable pipeline for interpretation of diagnostic ECG statements, detection of reduced ejection fraction and prediction of one-year mortality, it yielded predictive performances similar to state-of-the-art ‘black box’ DNNs. Contrary to the state-of-the-art, our pipeline provided inherent explainability on which ECG features were important for prediction or diagnosis. For example, ST elevation was discovered to be an important predictor for reduced ejection fraction, which is an important finding as it could limit the generalizability of the algorithm to the general population.Implications of all the available evidenceA longstanding assumption was that the high-dimensional and non-linear ‘black box’ nature of the currently applied ECG-based DNNs was inevitable to gain the impressive performances shown by these algorithms on conventional and novel use cases. This study, however, shows that inherently explainable DNNs should be the future of ECG interpretation, as they allow reliable clinical interpretation of these models without performance reduction, while also broadening their applicability to detect novel features in many other (rare) diseases. The application of such methods will lead to more confidence in DNN-based ECG analysis, which will facilitate the clinical implementation of DNNs in routine clinical practice.
Publisher
Cold Spring Harbor Laboratory
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Explainable Artificial Intelligence on Biosignals for Clinical Decision Support;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24
2. Deep Learning—Autoencoders;Clinical Applications of Artificial Intelligence in Real-World Data;2023
3. Deep Learning—Prediction;Clinical Applications of Artificial Intelligence in Real-World Data;2023