Incorporation of a modified temporal cepstrum smoothing in both signal-to-noise ratio and speech presence probability estimation for speech enhancement

Author:

Wang Dahan12,Hou Zhongshu12,Hu Yuxiang2,Zhu Changbao2,Lu Jing12ORCID,Chen Jingdong3

Affiliation:

1. Key Laboratory of Modern Acoustics and Institute of Acoustics, Nanjing University 1 , Nanjing 210093, People's Republic of China

2. NJU-Horizon Intelligent Audio Lab, Horizon Robotics 2 , Beijing 100094, People's Republic of China

3. Center of Intelligent Acoustics and Immersive Communications and Shaanxi Provincial Key Laboratory of Artificial Intelligence, Northwestern Polytechnical University, Northwestern Polytechnical University 3 , Xi'an 710072, People's Republic of China

Abstract

Numerous advanced and lightweight signal processing methods have been presented for single-channel speech enhancement (SE). It is imperative to carefully explore how to efficiently combine, integrate, and balance these methods. This paper proposes a more effective and less resource-intensive SE system, focused on the integration and adaptation of several approaches, especially the temporal cepstrum smoothing (TCS). First, a more robust fundamental frequency estimator is employed within TCS, mitigating the performance limitations caused by the inaccuracy of the original estimator. Additionally, a harmonic enhancement mechanism is introduced, effectively recovering the weak harmonic components. By incorporation of the modified TCS in the a posteriori speech presence probability estimation, the unbiased minimum mean square error noise power spectral density estimator can be refined. The modified TCS is also utilized for the a priori signal-to-noise ratio estimation. Moreover, this paper enhances the log-spectral amplitude estimator by applying both super-Gaussian speech priors and speech presence uncertainty for further improvement. Experimental evaluations demonstrate that the proposed method yields an improvement in speech quality while maintaining modest computational and storage requirements. Furthermore, the proposed system exhibits comparable performance to several baseline systems based on lightweight deep neural networks.

Funder

National Natural Science Foundation of China

Publisher

Acoustical Society of America (ASA)

Reference50 articles.

1. Speech spectral amplitude estimators using optimally shaped Gamma and Chi priors;Speech Commun.,2009

2. Log-spectral amplitude estimation with generalized Gamma distributions for speech enhancement,2011

3. A unified framework for designing optimal STSA estimators assuming maximum likelihood phase equivalence of speech and noise;IEEE Trans. Audio. Speech. Lang. Process.,2011

4. A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing,2008

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3