Affiliation:
1. University of Science and Technology of China, Hefei, China
2. University of Science and Technology of China, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, China
3. University of Science and Technology of China, China
Abstract
Sign language recognition (SLR) is a challenging problem, involving complex manual features (i.e., hand gestures) and fine-grained non-manual features (NMFs) (i.e., facial expression, mouth shapes,
etc
.). Although manual features are dominant, non-manual features also play an important role in the expression of a sign word. Specifically, many sign words convey different meanings due to non-manual features, even though they share the same hand gestures. This ambiguity introduces great challenges in the recognition of sign words. To tackle the above issue, we propose a simple yet effective architecture called Global-Local Enhancement Network (GLE-Net), including two mutually promoted streams toward different crucial aspects of SLR. Of the two streams, one captures the global contextual relationship, while the other stream captures the discriminative fine-grained cues. Moreover, due to the lack of datasets explicitly focusing on this kind of feature, we introduce the first non-manual-feature-aware isolated Chinese sign language dataset (NMFs-CSL) with a total vocabulary size of 1,067 sign words in daily life. Extensive experiments on NMFs-CSL and SLR500 datasets demonstrate the effectiveness of our method.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture
Cited by
28 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献