Affiliation:
1. Faculty of Engineering, Multimedia University, Cyberjaya 63100, Selangor, Malaysia
2. Computer Science, New York University, Abu Dhabi P.O. Box 1291888, United Arab Emirates
Abstract
Vidos from a first-person or egocentric perspective offer a promising tool for recognizing various activities related to daily living. In the egocentric perspective, the video is obtained from a wearable camera, and this enables the capture of the person’s activities in a consistent viewpoint. Recognition of activity using a wearable sensor is challenging due to various reasons, such as motion blur and large variations. The existing methods are based on extracting handcrafted features from video frames to represent the contents. These features are domain-dependent, where features that are suitable for a specific dataset may not be suitable for others. In this paper, we propose a novel solution to recognize daily living activities from a pre-segmented video clip. The pre-trained convolutional neural network (CNN) model VGG16 is used to extract visual features from sampled video frames and then aggregated by the proposed pooling scheme. The proposed solution combines appearance and motion features extracted from video frames and optical flow images, respectively. The methods of mean and max spatial pooling (MMSP) and max mean temporal pyramid (TPMM) pooling are proposed to compose the final video descriptor. The feature is applied to a linear support vector machine (SVM) to recognize the type of activities observed in the video clip. The evaluation of the proposed solution was performed on three public benchmark datasets. We performed studies to show the advantage of aggregating appearance and motion features for daily activity recognition. The results show that the proposed solution is promising for recognizing activities of daily living. Compared to several methods on three public datasets, the proposed MMSP–TPMM method produces higher classification performance in terms of accuracy (90.38% with LENA dataset, 75.37% with ADL dataset, 96.08% with FPPA dataset) and average per-class precision (AP) (58.42% with ADL dataset and 96.11% with FPPA dataset).
Funder
Ministry of Higher Education of Malaysia
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference40 articles.
1. Pirsiavash, H., and Ramanan, D. (2012, January 16–21). Detecting Activities of Daily Living in First-Person Camera Views. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.
2. Detection of Non-Suicidal Self-Injury Based on Spatiotemporal Features of Indoor Activities;Yang;IET Biom.,2023
3. [SCIM--spinal cord independence measure (version II): Sensitivity to functional changes];Catz;Harefuah,2002
4. ImageNet Large Scale Visual Recognition Challenge;Russakovsky;Int. J. Comput. Vis.,2015
5. Issa, M.E., Helmi, A.M., Al-Qaness, M.A.A., Dahou, A., Abd Elaziz, M., and Damaševičius, R. (2022). Human Activity Recognition Based on Embedded Sensor Data Fusion for the Internet of Healthcare Things. Healthcare, 10.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献