Affiliation:
1. University of California, San Diego, La Jolla, California, USA
2. Standard and Mobility Innovation Lab, Samsung Research America, Plano, Texas, USA
Abstract
Egocentric non-intrusive sensing of human activities of daily living (ADL) in free-living environments represents a holy grail in ubiquitous computing. Existing approaches, such as egocentric vision and wearable motion sensors, either can be intrusive or have limitations in capturing non-ambulatory actions. To address these challenges, we propose EgoADL, the first egocentric ADL sensing system that uses an in-pocket smartphone as a multi-modal sensor hub to capture body motion, interactions with the physical environment and daily objects using non-visual sensors (audio, wireless sensing, and motion sensors). We collected a 120-hour multimodal dataset and annotated 20-hour data into 221 ADL, 70 object interactions, and 91 actions. EgoADL proposes multi-modal frame-wise slow-fast encoders to learn the feature representation of multi-sensory data that characterizes the complementary advantages of different modalities and adapt a transformer-based sequence-to-sequence model to decode the time-series sensor signals into a sequence of words that represent ADL. In addition, we introduce a self-supervised learning framework that extracts intrinsic supervisory signals from the multi-modal sensing data to overcome the lack of labeling data and achieve better generalization and extensibility. Our experiments in free-living environments demonstrate that EgoADL can achieve comparable performance with video-based approaches, bringing the vision of ambient intelligence closer to reality.
Funder
National Institute On Aging of the National Institutes of Health
National Science Foundation
Samsung collaboration grant
Google Ph.D. Fellowship
Publisher
Association for Computing Machinery (ACM)
Reference70 articles.
1. Albert Haque, Arnold Milstein, and Li Fei-Fei. Illuminating the dark spaces of healthcare with ambient intelligence. Nature, 2020.
2. Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals
3. Dima Damen, Hazel Doughty, and et.al. Scaling egocentric vision: The epic-kitchens dataset. In Proceedings of ECCV, 2018.
4. Actor and Observer: Joint Modeling of First and Third-Person Videos
5. Ego4D: Around the World in 3,000 Hours of Egocentric Video