Affiliation:
1. Department of Electrical and Computer Engineering The University of Alabama Tuscaloosa Alabama USA
2. Department of American Sign Language Gallaudet University Washington District of Columbia USA
3. Department of Psychology Gallaudet University Washington District of Columbia USA
4. Department of Communication Studies The University of Alabama Tuscaloosa Alabama USA
5. Department of Computer Science The University of Alabama Tuscaloosa Alabama USA
Abstract
AbstractOver the past decade, there have been great advancements in radio frequency sensor technology for human–computer interaction applications, such as gesture recognition, and human activity recognition more broadly. While there is a significant amount of study on these topics, in most cases, experimental data are acquired in controlled settings by directing participants what motion to articulate. However, especially for communicative motions, such as sign language, such directed data sets do not accurately capture natural, in situ articulations. This results in a difference in the distribution of directed American Sign Language (ASL) versus natural ASL, which severely degrades natural sign language recognition in real‐world scenarios. To overcome these challenges and acquire more representative data for training deep models, the authors develop an interactive gaming environment, ChessSIGN, which records video and radar data of participants as they play the game without any external direction. The authors investigate various ways of generating synthetic samples from directed ASL data, but show that ultimately such data does not offer much improvement over just initialising using imagery from ImageNet. In contrast, an interactive learning paradigm is proposed by the authors in which model training is shown to improve as more and more natural ASL samples are acquired and augmented via synthetic samples generated from a physics‐aware generative adversarial network. The authors show that the proposed approach enables the recognition of natural ASL in a real‐world setting, achieving an accuracy of 69% for 29 ASL signs—a 60% improvement over conventional training with directed ASL data.
Funder
National Science Foundation
American Association of University Women
Publisher
Institution of Engineering and Technology (IET)