Affiliation:
1. Stony Brook University, New York, USA
2. University of California, Merced, Merced, USA
3. University of Texas at Arlington, Arlington, USA
Abstract
In this paper, we present MuteIt, an ear-worn system for recognizing unvoiced human commands. MuteIt presents an intuitive alternative to voice-based interactions that can be unreliable in noisy environments, disruptive to those around us, and compromise our privacy. We propose a twin-IMU set up to track the user's jaw motion and cancel motion artifacts caused by head and body movements. MuteIt processes jaw motion during word articulation to break each word signal into its constituent syllables, and further each syllable into phonemes (vowels, visemes, and plosives). Recognizing unvoiced commands by only tracking jaw motion is challenging. As a secondary articulator, jaw motion is not distinctive enough for unvoiced speech recognition. MuteIt combines IMU data with the anatomy of jaw movement as well as principles from linguistics, to model the task of word recognition as an estimation problem. Rather than employing machine learning to train a word classifier, we reconstruct each word as a sequence of phonemes using a bi-directional particle filter, enabling the system to be easily scaled to a large set of words. We validate MuteIt for 20 subjects with diverse speech accents to recognize 100 common command words. MuteIt achieves a mean word recognition accuracy of 94.8% in noise-free conditions. When compared with common voice assistants, MuteIt outperforms them in noisy acoustic environments, achieving higher than 90% recognition accuracy. Even in the presence of motion artifacts, such as head movement, walking, and riding in a moving vehicle, MuteIt achieves mean word recognition accuracy of 91% over all scenarios.
Funder
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture,Human-Computer Interaction
Reference96 articles.
1. Amazon. 2021. Most used voice assistants in the United States in 2021 , by age group. https://www.statista.com/statistics/1274429/voice-assistants-use-by-age-group-united-states/ Amazon. 2021. Most used voice assistants in the United States in 2021, by age group. https://www.statista.com/statistics/1274429/voice-assistants-use-by-age-group-united-states/
2. Amazon. 2022. Amazon Alexa. https://developer.amazon.com/en-US/alexa Amazon. 2022. Amazon Alexa. https://developer.amazon.com/en-US/alexa
3. IoT Analytics. 2021. State of IoT 2021: Number of connected IoT devices growing 9% to 12.3 billion globally cellular IoT now surpassing 2 billion. https://iot-analytics.com/number-connected-iot-devices/ IoT Analytics. 2021. State of IoT 2021: Number of connected IoT devices growing 9% to 12.3 billion globally cellular IoT now surpassing 2 billion. https://iot-analytics.com/number-connected-iot-devices/
4. CanalSense
5. Apple. 2022. Siri Apple. https://www.apple.com/siri/ Apple. 2022. Siri Apple. https://www.apple.com/siri/
Cited by
19 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Enabling Hands-Free Voice Assistant Activation on Earphones;Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services;2024-06-03
2. F
2
Key: Dynamically Converting Your Face into a Private Key Based on COTS Headphones for Reliable Voice Interaction;Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services;2024-06-03
3. Speaking of accent: A content analysis of accent misconceptions in ASR research;The 2024 ACM Conference on Fairness, Accountability, and Transparency;2024-06-03
4. Lipwatch: Enabling Silent Speech Recognition on Smartwatches using Acoustic Sensing;Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies;2024-05-13
5. MELDER: The Design and Evaluation of a Real-time Silent Speech Recognizer for Mobile Devices;Proceedings of the CHI Conference on Human Factors in Computing Systems;2024-05-11