Enhanced by Visual and Semantic Consistency for Continuous Sign Language Recognition

Author:

Xiong Sije1,Zou Chunlong2,Yun Juntong1,Jiang Du1,Huang Li1,Liu Ying3,Li Gongfa1,Xie Yuanmin1

Affiliation:

1. Wuhan University of Science and Technology

2. Hubei University of Automotive Technology

3. Hubei Engineering University

Abstract

Abstract

Camera-based interface enables simple human-computer interaction with intuitive sign language for hearing-impaired users. Sign language, as a visual language, utilizes changes in hand shape, body movements, and facial expressions to collaboratively convey information. Most of the current continuous sign language recognition (CSLR) models focus their attention on the extraction of information from each frame of the image and ignore the dynamically changing characteristics of the signer across multiple frames. The contrasts with the essence of sign language recognition: which aims to learn the most essential feature representations of changes in the hand-controlled part and the non-hand-controlled part, and convert them into language. In this paper, we first use the feature alignment method to explicitly capture the spatial position offset and motion direction information between neighboring frames, direct a dynamic attention mechanism to focus on the subtle change region, enhance visual representation extraction. And we propose a dynamic decoding method based on maximum backtracking probability to decode word-level features and achieve word consistency constraints without increasing computational resources, enhance semantic consistency. We propose a comprehensive CSLR model utilizing a Dynamic Attention Mechanism and Maximum Backtracking Probability Dynamic Decoding (DAM-MCD), enhancing the model's inference capability and robustness. Experiments were conducted on two publicly accessible datasets, PHOENIX14 (Koller et al. Comput Vis Image Underst 141:108–125, 2015), PHOENIX14-T (Camgoz et al., in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7784–7793, 2018), demonstrating that the DAM-MCD model achieves higher accuracy compared to methods employing multi-cue input. The results further show that the DAM-MCD model effectively captures sign language motion information in videos. Models will be made public on: https://github.com/smellno/Continuous-Sign-Language-Recognition-.

Publisher

Springer Science and Business Media LLC

Reference45 articles.

1. Z. Niu, and B. Mak, “Stochastic fine-grained labeling of multi-state sign glosses for continuous sign language recognition,” in Proc. Eur. Conf. Comput. Vis. (ECCV), August. 2020, pp. 172–186.

2. T. Guo, C. Wen, D. Jiang et al., “Didispeech: A large scale mandarin speech corpus,” Speech Signal Process. (ICASSP). IEEE, 2021, pp.6968–6972.

3. B. Zhang, H. Lv, P. Guo, et al., “Wenetspeech: A 10000 + hours multi-domain mandarin corpus for speech recognition,” Speech Signal Process. (ICASSP), IEEE, 2022, pp.6182–6186.

4. Y. Min, A. Hao, X. Chai, and X. Chen, “Visual alignment constraint for continuous sign language recognition,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp.11 542–11 551.

5. K. Cheng, Z. Yang, Q. Chen, and Y. Tai, “Fully convolutional networks for continuous sign language recognition,” in Proc. Eur. Conf. Comput. Vis. (ECCV), August. 2020, pp.697–714.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3