Affiliation:
1. University of Science and Technology of China and Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, Anhui, China
2. University of Science and Technology of China, Hefei, Anhui, China
Abstract
To tap into the gold mine of data generated by Internet of Things (IoT) devices with unprecedented volume and value, there is an urgent need to efficiently and accurately label raw sensor data. To this end, we explore and leverage the hidden connections among the multimodal data collected by various sensing devices and propose to let different modal data complement and learn from each other. But it is challenging to align and fuse multimodal data without knowing their perception (and thus the correct labels). In this work, we propose
MultiSense
, a paradigm for automatically mining potential perception, cross-labelling each modal data, and then updating the learning models for recognizing human activity to achieve higher accuracy or even recognize new activities. We design innovative solutions for segmenting, aligning, and fusing multimodal data from different sensors, as well as model updating mechanism. We implement our framework and conduct comprehensive evaluations on a rich set of data. Our results demonstrate that
MultiSense
significantly improves the data usability and the power of the learning models. With nine diverse activities performed by users, our framework automatically labels multimodal sensing data generated by five different sensing mechanisms (video, smart watch, smartphone, audio, and wireless-channel) with an average accuracy 98.5%. Furthermore, it enables models of some modalities to learn unknown activities from other modalities and greatly improves the activity recognition ability.
Funder
National Key R&D Program of China
China National Natural Science Foundation
The Fundamental Research Funds for the Central Universities
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications