Affiliation:
1. School of Electrical and Computer Engineering, Aristotle University of Thessaloniki 1 , Thessaloniki 54124, Greece
2. Cambridge Hearing Group, MRC Cognition and Brain Sciences Unit, University of Cambridge 2 , Cambridge CB2 7EF, United Kingdom
Abstract
Understanding speech in noisy environments is a challenging task, especially in communication situations with several competing speakers. Despite their ongoing improvement, assistive listening devices and speech processing approaches still do not perform well enough in noisy multi-talker environments, as they may fail to restore the intelligibility of a speaker of interest among competing sound sources. In this study, a quasi-causal deep learning algorithm was developed that can extract the voice of a target speaker, as indicated by a short enrollment utterance, from a mixture of multiple concurrent speakers in background noise. Objective evaluation with computational metrics demonstrated that the speaker-informed algorithm successfully extracts the target speaker from noisy multi-talker mixtures. This was achieved using a single algorithm that generalized to unseen speakers, different numbers of speakers and relative speaker levels, and different speech corpora. Double-blind sentence recognition tests on mixtures of one, two, and three speakers in restaurant noise were conducted with listeners with normal hearing and listeners with hearing loss. Results indicated significant intelligibility improvements with the speaker-informed algorithm of 17% and 31% for people without and with hearing loss, respectively. In conclusion, it was demonstrated that deep learning-based speaker extraction can enhance speech intelligibility in noisy multi-talker environments where uninformed speech enhancement methods fail.
Funder
Medical Research Foundation
Publisher
Acoustical Society of America (ASA)
Reference117 articles.
1. Ba,
J. L.,
Kiros,
J. R., and
Hinton,
G. E. (2016). “
Layer normalization,” arXiv:1607.06450 (Last viewed July 25, 2024).
2. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility;PLoS One,2018
3. Preferred signal path delay and high-pass cut-off in open fittings;Int. J. Audiol.,2010
4. Improving competing voice segregation for hearing-impaired listeners using a low-latency deep neural network algorithm;J. Acoust. Soc. Am.,2018