Abstract
Abstract
Supervised machine learning learns a mapping from input data to output labels, based on the patterns and relationships present in a huge labelled training data.Getting labelled data generally requires a substantial allocation of resources in terms of cost and time. In such scenarios, weak supervised learning techniques like data programming (DP) and active learning (AL) can be advantageous for time-series classification tasks. These paradigms can be used to assign data labels in an automated manner, and time-series classification can subsequently be carried out on the labeled data. This work proposes a novel framework titled AL enhanced data programming (ActDP). It uses a combination of DP and AL for electrocardiogram (ECG) beat classification using single-lead data. ECG beat classification is pivotal in cardiology and healthcare applications for diagnosing a broad spectrum of heart conditions and arrhythmias. To establish the usefulness of this proposed ActDP framework, the experiments have been conducted using the MIT-BIH dataset with 94,224 ECG beats. DP assigns a probabilistic label to each ECG beat using nine novel polar labelling functions and a generative model in this work. Further, AL improves the result of DP by replacing the labels for sampled ECG beats of a generative model with ground truth. Subsequently, a discriminative model is trained on these labels for each iteration. The experimental results show that by incorporating AL into DP in the ActDP framework, the accuracy of ECG classification strictly increases from 85.7% to 97.34% in 58 iterations. Comparatively, the proposed framework (ActDP) has demonstrated a higher classification accuracy of 97.34%. In contrast, DP with data augmentation (DA) achieves an accuracy of 92.2%, while DP without DA results in an accuracy of 85.7%, few-shot learning techniques yield 87.5%–89.2%, and multi-instance learning methods achieve accuracies in the range of 88.9%–94.1%