Zero-shot test-time adaptation via knowledge distillation for personalized speech denoising and dereverberation

Author:

Kim Sunwoo1,Athi Mrudula1,Shi Guangji1,Kim Minje12ORCID,Kristjansson Trausti1

Affiliation:

1. Amazon Lab126 1 , Sunnyvale, California 94089, USA

2. University of Illinois at Urbana-Champaign 2 Department of Computer Science, , Urbana, Illinois 61801, USA

Abstract

A personalization framework to adapt compact models to test time environments and improve their speech enhancement (SE) performance in noisy and reverberant conditions is proposed. The use-cases are when the end-user device encounters only one or a few speakers and noise types that tend to reoccur in the specific acoustic environment. Hence, a small personalized model that is sufficient to handle this focused subset of the original universal SE problem is postulated. The study addresses a major data shortage issue: although the goal is to learn from a specific user's speech signals and the test time environment, the target clean speech is unavailable for model training due to privacy-related concerns and technical difficulty of recording noise and reverberation-free voice signals. The proposed zero-shot personalization method uses no clean speech target. Instead, it employs the knowledge distillation framework, where the more advanced denoising results from an overly large teacher work as pseudo targets to train a small student model. Evaluation on various test time conditions suggests that the proposed personalization approach can significantly enhance the compact student model's test time performance. Personalized models outperform larger non-personalized baseline models, demonstrating that personalization achieves model compression with no loss in dereverberation and denoising performance.

Funder

National Science Foundation

Publisher

Acoustical Society of America (ASA)

Reference72 articles.

1. Voicehome-2, an extended corpus for multichannel speech processing in real homes;Speech Commun.,2019

2. LSQ+: Improving low-bit quantization through learnable offsets and better initialization,2020

3. Suppression of acoustic noise in speech using spectral subtraction;IEEE Trans. Acoust., Speech, Signal Process.,1979

4. Chai, L., Du, J., and Lee, C.-H. (2018). “ Acoustics-guided evaluation (age): A new measure for estimating performance of speech enhancement algorithms for robust ASR,” arXiv:1811.11517.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3