Abstract
Human activity recognition (HAR) has gained significant attention in computer vision and human‐computer interaction. This paper investigates the difficulties encountered in human activity recognition (HAR), precisely differentiating between various activities by extracting spatial and temporal features from sequential data. Traditional machine learning approaches necessitate manual feature extraction, hindering their effectiveness. For temporal features, RNNs have been widely used for HAR; however, they need help processing long sequences, leading to information bottlenecks. This work introduces a framework that effectively integrates spatial and temporal features by utilizing a series of layers that incorporate a self‐attention mechanism to overcome these problems. Here, spatial characteristics are derived using 1D convolutions coupled with pooling layers to capture essential spatial information. After that, GRUs are used to make it possible to effectively represent the temporal dynamics that are inherent in sequential data. Furthermore, the utilization of an attention mechanism serves the purpose of dynamically selecting the significant segments within the sequence, thereby improving the model’s comprehension of context and enhancing the efficacy of deep neural networks (DNNs) in the domain of human activity recognition (HAR). Three different optimizers, namely, Adam, SGD, and RMSprop, were employed to train the model. Each optimizer was tested with three distinct learning rates of 0.1, 0.001, and 0.0001. Experiments on the UCI‐HAR dataset have shown that the model works well, with an impressive 97% accuracy rate when using the Adam optimizer with a learning rate of 0.001.