Abstract
Human interaction and computer vision converge in the realm of Human Activity Recognition (HAR), which is a research field dedicated to the creation of automated systems capable of observing and categorizing human activities. This domain closely aligns with machine learning, involving the development of algorithms and models adept at learning to recognize and classify patterns within data. HAR typically unfolds in two pivotal phases: data acquisition and processing, followed by activity classification. In the initial phase of data acquisition and processing, information is gathered from various sensors or video sources, such as accelerometers, smartphones, or smartwatches. Subsequently, the collected data undergo preprocessing to extract relevant features. The subsequent phase, activity classification, employs machine learning algorithms to categorize these extracted features into distinct activity types, ranging from walking and running to sitting. This paper introduces an innovative approach grounded on these two fundamental phases. For the first phase, we leverage the MediaPipe algorithm to discern human articulations. Once these poses are detected, we contribute by extracting the coordinates of each articulation. These coordinates are then transformed into graphs, where nodes signify the articulation coordinates and edges represent the connections between them. In the second phase, we enhance existing methodologies by incorporating a diverse set of machine learning models. Notably, the utilization of Graph Neural Networks (GNNs) which stands out as a significant advancement. This choice proves instrumental in effectively learning and representing complex spatial and temporal patterns, surpassing the limi-tations of conventional machine learning algorithms. The developed system undergoes evaluation on the KTH and UCF50 datasets, demonstrating state-of-the-art performance in HAR.