Author:
Figueroa David,Nishio Shuichi,Yamazaki Ryuji,Ishiguro Hiroshi
Abstract
The use of voice-operated robots in real-life settings introduces multiple issues as opposed to the use of them in controlled, laboratory conditions. In our study, we introduced conversation robots in the homes of 18 older adults’ homes to increase the conversation activities of the participants. A manual examination of the audio data the robot considered a human voice showed that a considerable amount was from television sounds present in the participants’ homes. We used this data to train a neural network that can differentiate between human speech and speech-like sounds from television, achieving high metrics. We extended our analysis into how the voices of the participants contain inherent patterns that can be general or uncommon and how this affects performance of our algorithm in our attempts to identify human speech with or without these patterns.
Subject
Industrial and Manufacturing Engineering
Reference22 articles.
1. Blue L, Vargas L, Traynor P. Hello, is it me you're looking for? Differentiating between human and electronic speakers for voice interface security. Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks: New York, USA; 2018. 123-133 p.
2. Yamazaki R, Nishio S, Nagata Y, et al. A Preliminary Study of robotic media effects on older adults with mild cognitive impairment in solitude. Proceedings of International Conference on Social Robotics: Singapore; 2021. 10-13 p.
3. Abdullah H, Garcia W, Peeters C, et al. Practical hidden voice attacks against speech and speaker recognition systems. Proceedings of the 26th Network and Distributed System Security Symposium: San Diego, USA; 2019. 24-27 p.
4. Vaidya T, Zhang Y, Sherr M, et al. Cocaine noodles: Exploiting the gap between human and machine speech recognition. Proceedings of the 9th USENIX Conference on Offensive Technologies: Denver, USA; 2015. 16 p.
5. Hughes T, Mierle K. Recurrent neural networks for voice activity detection. Proceedings of the 38th International Conference on Acoustics, Speech and Signal Processing: Vancouver, Canada; 2013. 26-31 p.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献