Abstract
Aim
The aim of this study is to apply a novel hybrid framework incorporating a Vision Transformer (ViT) and bidirectional long short-term memory (Bi-LSTM) model for classifying physical activity intensity (PAI) in adults using gravity-based acceleration. Additionally, it further investigates how PAI and temporal window (TW) impacts the model’ s accuracy.
Method
This research used the Capture-24 dataset, consisting of raw accelerometer data from 151 participants aged 18 to 91. Gravity-based acceleration was utilised to generate images encoding various PAIs. These images were subsequently analysed using the ViT-BiLSTM model, with results presented in confusion matrices and compared with baseline models. The model's robustness was evaluated through temporal stability testing and examination of accuracy and loss curves.
Result
The ViT-BiLSTM model excelled in PAI classification task, achieving an overall accuracy of 98.5% ±1.48% across five TWs-98.7% for 1s, 98.1% for 5s, 98.2% for 10s, 99% for 15s, and 98.65% for 30s of TW. The model consistently exhibited superior accuracy in predicting sedentary (98.9%±1%) compared to light physical activity (98.2%±2%) and moderate-to-vigorous physical activity (98.2%± 3%). ANOVA showed no significant accuracy variation across PAIs (F = 2.18, p = 0.13) and TW (F = 0.52, p = 0.72). Accuracy and loss curves show the model consistently improves its performance across epochs, demonstrating its excellent robustness.
Conclusion
This study demonstrates the ViT-BiLSTM model’s efficacy in classifying PAI using gravity-based acceleration, with performance remaining consistent across diverse TWs and intensities. However, PAI and TW could result in slight variations in the model’s performance. Future research should concern and investigate the impact of gravity-based acceleration on PAI thresholds, which may influence model's robustness and reliability.