Abstract
The multi-lead electrocardiogram (ECG) is extensively used in clinical diagnosis and monitoring of cardiac conditions. With the development of deep learning, automated multi-lead ECG diagnostic networks play a crucial role in biomedical engineering and clinical cardiac disease diagnosis. Methods for intelligent ECG diagnosis include Recurrent Neural Networks (RNN), Transformers, and Convolutional Neural Networks (CNN). However, CNN can extract local spatial features of images, but it cannot learn global spatial features and temporal memory features. On the other hand, RNN relies on time and can remember important sequence features, but it cannot effectively extract long dependencies of sequence data in practical situations. The self-attention mechanism in Transformer has the capability of global feature extraction, but it does not adequately prioritize local features and lacks spatial and channel feature extraction capabilities. In this study, we propose STFAC-ECGNet, which comprises a CAMV-RNN block, CBMV-CNN block, and TSEF block, combining the advantages of CNN, RNN, and Transformer. The CAMV-RNN block introduces a coordinated adaptive simplified self-attention module, which adaptively performs global sequence feature memorization and enhances spatial-temporal information. The CBMV-CNN block integrates spatial and channel attentional mechanism modules in a skip connection, which can combine spatial and channel information. The TSEF block implements enhanced multi-scale fusion of image spatial and sequence temporal features. In this study, comprehensive experiments were conducted using the PTB-XL large publicly available ECG dataset and the China Physiological Signal Challenge 2018 (CPSC2018) database. The results demonstrate that STFAC-ECGNet outperforms other state-of-the-art methods in multiple tasks, exhibiting robustness and generalization.