Comparison of ideal mask-based speech enhancement algorithms for speech mixed with white noise at low mixture signal-to-noise ratios-Reference-Cited by-同舟云学术

Comparison of ideal mask-based speech enhancement algorithms for speech mixed with white noise at low mixture signal-to-noise ratios

Published:2022-12 Issue:6 Volume:152 Page:3458-3470
ISSN:0001-4966
Container-title:The Journal of the Acoustical Society of America
language:en
Short-container-title:The Journal of the Acoustical Society of America

Author:

Graetzer Simone¹^ORCID,Hopkins Carl¹^ORCID

Affiliation:

1. Acoustics Research Unit, School of Architecture, University of Liverpool, Liverpool, L69 7ZN, United Kingdom

Abstract

The literature shows that the intelligibility of noisy speech can be improved by applying an ideal binary or soft gain mask in the time-frequency domain for signal-to-noise ratios (SNRs) between –10 and +10 dB. In this study, two mask-based algorithms are compared when applied to speech mixed with white Gaussian noise (WGN) at lower SNRs, that is, SNRs from −29 to –5 dB. These comprise an Ideal Binary Mask (IBM) with a Local Criterion (LC) set to 0 dB and an Ideal Ratio Mask (IRM). The performance of three intrusive Short-Time Objective Intelligibility (STOI) variants—STOI, STOI+, and Extended Short-Time Objective Intelligibility (ESTOI)—is compared with that of other monaural intelligibility metrics that can be used before and after mask-based processing. The results show that IRMs can be used to obtain near maximal speech intelligibility (>90% for sentence material) even at very low mixture SNRs, while IBMs with LC = 0 provide limited intelligibility gains for SNR < −14 dB. It is also shown that, unlike STOI, STOI+ and ESTOI are suitable metrics for speech mixed with WGN at low SNRs and processed by IBMs with LC = 0 even when speech is high-pass filtered to flatten the spectral tilt before masking.

Publisher

Acoustical Society of America (ASA)

Subject

Acoustics and Ultrasonics,Arts and Humanities (miscellaneous)

Link

https://asa.scitation.org/doi/pdf/10.1121/10.0016494

Reference38 articles.

1. On the optimality of ideal binary time–frequency masks

2. Binary and ratio time-frequency masks for robust speech recognition

3. Speech Recognition with Primarily Temporal Cues

4. Effects of envelope bandwidth on the intelligibility of sine- and noise-vocoded speech

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Using deep learning to improve the intelligibility of a target speaker in noisy multi-talker environments for people with normal hearing and hearing loss;The Journal of the Acoustical Society of America;2024-07-01

2. Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications;Sensors;2023-04-29