Lipwatch: Enabling Silent Speech Recognition on Smartwatches using Acoustic Sensing-Reference-Cited by-同舟云学术

Lipwatch: Enabling Silent Speech Recognition on Smartwatches using Acoustic Sensing

Published:2024-05-13 Issue:2 Volume:8 Page:1-29
ISSN:2474-9567
Container-title:Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
language:en
Short-container-title:Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Author:

Zhang Qian¹^ORCID,Lan Yubin¹^ORCID,Guo Kaiyi¹^ORCID,Wang Dong¹^ORCID

Affiliation:

1. Shanghai Jiao Tong University, China

Abstract

Silent Speech Interfaces (SSI) on mobile devices offer a privacy-friendly alternative to conventional voice input methods. Previous research has primarily focused on smartphones. In this paper, we introduce Lipwatch, a novel system that utilizes acoustic sensing techniques to enable SSI on smartwatches. Lipwatch leverages the inaudible waves emitted by the watch's speaker to capture lip movements and then analyzes the echo to enable SSI. In contrast to acoustic sensing-based SSI on smartphones, our development of Lipwatch takes into full consideration the specific scenarios and requirements associated with smartwatches. Firstly, we elaborate a wake-up-free mechanism, allowing users to interact without the need for a wake-up phrase or button presses. The mechanism utilizes the inertial sensors on the smartwatch to detect gestures, in combination with acoustic signals that detecting lip movements to determine whether SSI should be activated. Secondly, we design a flexible silent speech recognition mechanism that explores limited vocabulary recognition to comprehend a broader range of user commands, even those not present in the training dataset, relieving users from strict adherence to predefined commands. We evaluate Lipwatch on 15 individuals using a set of the 80 most common interaction commands on smartwatches. The system achieves a Word Error Rate (WER) of 13.7% in user-independent test. Even when users utter commands containing words absent in the training set, Lipwatch still demonstrates a remarkable 88.7% top-3 accuracy. We implement a real-time version of Lipwatch on a commercial smartwatch. The user study shows that Lipwatch can be a practical and promising option to enable SSI on smartwatches.

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3659614

Reference61 articles.

1. Triantafyllos Afouras, Joon Son Chung, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. 2018. Deep Audio-visual Speech Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018), 1--1.

2. Performer vs. observer

3. Crowdsourcing vs Laboratory-Style Social Acceptability Studies?

4. Yannis M Assael, Brendan Shillingford, Shimon Whiteson, and Nando De Freitas. 2016. Lipnet: End-to-end sentence-level lipreading. arXiv preprint arXiv:1611.01599 (2016).

5. Unsupervised speech recognition;Baevski Alexei;Advances in Neural Information Processing Systems,2021

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sensing to Hear through Memory;Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies;2024-05-13