Affiliation:
1. Xi'an Jiaotong University, China
2. Zhejiang University, China
Abstract
Radio frequency (RF) devices such as Wi-Fi transceivers, radio frequency identification tags, and millimeter-wave radars have appeared in large numbers in daily lives. The presence and movement of humans can affect the propagation of RF signals, further, this phenomenon is exploited for human action recognition. Compared to camera solutions, RF approaches exhibit greater resilience to occlusions and lighting conditions, while also raising fewer privacy concerns in indoor scenarios. However, current works have many limitations, including the unavailability of datasets, insufficient training samples, and simple or limited action categories for specific applications, which seriously hinder the growth of RF solutions, presenting a significant obstacle in transitioning RF sensing research from the laboratory to a wide range of everyday life applications. To facilitate the transitioning, in this paper, we introduce and release a large-scale multiple radio frequency dataset, named XRF55, for indoor human action analysis. XRF55 encompasses 42.9K RF samples and 55 action classes of human-object interactions, human-human interactions, fitness, body motions, and human-computer interactions, collected from 39 subjects within 100 days. These actions were meticulously selected from 19 RF sensing papers and 16 video action recognition datasets. Each action is chosen to support various applications with high practical value, such as elderly fall detection, fatigue monitoring, domestic violence detection, etc. Moreover, XRF55 contains 23 RFID tags at 922.38MHz, 9 Wi-Fi links at 5.64GHz, one mmWave radar at 60-64GHz, and one Azure Kinect with RGB+D+IR sensors, covering frequency across decimeter wave, centimeter wave, and millimeter wave. In addition, we apply a mutual learning strategy over XRF55 for the task of action recognition. Unlike simple modality fusion, under mutual learning, three RF modalities are trained collaboratively and then work solely. We find these three RF modalities will promote each other. It is worth mentioning that, with synchronized Kinect, XRF55 also supports the exploration of action detection, action segmentation, pose estimation, human parsing, mesh reconstruction, etc., with RF-only or RF-Vision approaches.
Funder
Pioneer' and Leading Goose' R&D Program of Zhejiang
China Postdoctoral Science Foundation
Key Research and Development Program of Shaanxi
National Natural Science Foundation of China
Publisher
Association for Computing Machinery (ACM)
Reference76 articles.
1. Capturing the human figure through a wall
2. Fadel Adib, Zach Kabelac, Dina Katabi, and Robert C Miller. 2014. 3D tracking via body radio reflections. In 11th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 14). 317--329.
3. Rami Alazrai, Ali Awad, Alsaify Baha'A, Mohammad Hababeh, and Mohammad I Daoud. 2020. A dataset for Wi-Fi-based human-to-human interaction recognition. Data in brief 31 (2020), 105668.
4. ViViT: A Video Vision Transformer
5. RADAR: an in-building RF-based user location and tracking system