Affiliation:
1. University of Agder
2. Simula Metropolitan Center for Digital Engineering
Abstract
Abstract
Background
Daily activity of humans is monitored at a large scale automatically by devices such as mobile phones and wearables. This produces immense amounts of data that can be used to get a better understanding of human behavior over time. To understand this data and its possibilities, a structured and controlled collection process is required. Physical activity monitoring using wearable sensors has attracted prevalent attention in healthcare, sports science, and fitness applications. However, ensuring the availability of diverse and comprehensive datasets for research and algorithm development can be challenging.
Objective
We emphasize the importance of semantic representation for physical activity sensor observations to enable data interoperability and advanced analytics. In this proof-of-concept study, we propose an approach to improve the usability of physical activity datasets and highlight ethical considerations by generating synthetic datasets using medical-grade (CE certified) sensor. Moreover, our study presents a comparative analysis between real and synthetic activity datasets, evaluating their utilities to address model bias and fairness in predictive analysis.
Methods
We design and develop an ontology for semantic representation of physical activity sensor observations and predictive analysis on collected data with MOX2-5 activity sensors. The MOX2-5 activity monitoring device can collect and transmit high-resolution activity data such as activity intensity, weight-bearing, sedentary, standing, low physical activity, moderate physical activity, vigorous physical activity, and steps per minute. We collected physical activity data from 16 adults (Male: 12; Female: 4) for 30–45 days (about 1 and a half months). It produced a volume of 539 records which is small. Thus, we utilize different synthetic data generation methods, such as Gaussian Capula (GC), Conditional Tabular General Adversarial Network (CTGAN), and Tabular General Adversarial Network (TABGAN) to enhance the dataset with synthetic data. For both the real and synthetic datasets, we developed a Multilayer Perceptron (MLP) classification model to classify daily physical activity levels.
Results
The results highlight that semantic ontology is suitable for semantic search, knowledge representation, data integration, reasoning, and capturing the meaning and relationships between data. The analysis proves the hypothesis that the efficiency of predictive models grows with the increasing volume of additional synthetic training data.
Conclusions
The potential of ontology and Generative AI may accelerate research and innovation in the field of behavioral monitoring. Moreover, the presented data (both real MOX2-5 and its synthetic version) will be helpful in the creation of robust methods for the classification of activity types and different research directions in connection to synthetic data such as model efficiency, detection of generated data and data privacy.
Publisher
Research Square Platform LLC
Reference23 articles.
1. Benefits of Physical Activity. Webpage: https://www.cdc.gov/physicalactivity/basics/pa-health/index.htm. (Acceded on 18th September 2023).
2. ‘ProHealth eCoach: User-centered design and development of an eCoach app to promote healthy lifestyle with personalized activity recommendations’;Chatterjee A;BMC Health Services Research,2022
3. Physical activity. Webpage: https://www.who.int/news-room/fact-sheets/detail/physical-activity. (Acceded on 18th September 2023).
4. ‘Impact of activity monitoring on physical activity, sedentary behavior, and body weight during the COVID-19 pandemic’;Barkley JE;International Journal of Environmental Research and Public Health,2021
5. Thambawita, V. et al. (2020) ‘PMDATA’, Proceedings of the 11th ACM Multimedia Systems Conference [Preprint]. doi:10.1145/3339825.3394926.