Spatial speech detection for binaural hearing aids using deep phoneme classifiers-Reference-Cited by-同舟云学术

Spatial speech detection for binaural hearing aids using deep phoneme classifiers

Published:2022 Issue: Volume:6 Page:25
ISSN:2681-4617
Container-title:Acta Acustica
language:
Short-container-title:Acta Acust.

Author:

Kayser Hendrik^ORCID,Hermansky Hynek,Meyer Bernd T.

Abstract

Current hearing aids are limited with respect to speech-specific optimization for spatial sound sources to perform speech enhancement. In this study, we therefore propose an approach for spatial detection of speech based on sound source localization and blind optimization of speech enhancement for binaural hearing aids. We have combined an estimator for the direction of arrival (DOA), featuring high spatial resolution but no specialization to speech, with a measure of speech quality with low spatial resolution obtained after directional filtering. The DOA estimator provides spatial sound source probability in the frontal horizontal plane. The measure of speech quality is based on phoneme representations obtained from a deep neural network, which is part of a hybrid automatic speech recognition (ASR) system. Three ASR-based speech quality measures (ASQM) are explored: entropy, mean temporal distance (M-Measure), matched phoneme (MaP) filtering. We tested the approach in four acoustic scenes with one speaker and either a localized or a diffuse noise source at various signal-to-noise ratios (SNR) in anechoic or reverberant conditions. The effects of incorrect spatial filtering and noise were analyzed. We show that two of the three ASQMs (M-Measure, MaP filtering) are suited to reliably identify the speech target in different conditions. The system is not adapted to the environment and does not require a-priori information about the acoustic scene or a reference signal to estimate the quality of the enhanced speech signal. Nevertheless, our approach performs well in all acoustic scenes tested and varying SNRs and reliably detects incorrect spatial filtering angles.

Funder

National Institute on Deafness and Other Communication Disorders

Deutsche Forschungsgemeinschaft

Publisher

EDP Sciences

Subject

Electrical and Electronic Engineering,Speech and Hearing,Computer Science Applications,Acoustics and Ultrasonics

Link

https://acta-acustica.edpsciences.org/10.1051/aacus/2022013/pdf

Reference45 articles.

1. Weninger F., Erdogan H., Watanabe S., Vincent E., Le Roux J., Hershey J.R., Schuller B.: Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, in: International Conference on Latent Variable Analysis and Signal Separation, pp. 91–99.

2. Long short-term memory for speaker generalization in supervised speech separation

3. Xu C., Rao W., Xiao X., Chng E.S., Li H.: Single channel speech separation with constrained utterance level permutation invariant training using grid LSTM. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE. 2018, pp. 6–10.