Speaker Identification Using Empirical Mode Decomposition-Based Voice Activity Detection Algorithm under Realistic Conditions

Author:

Rudramurthy M.S.1,Pathak Nilabh Kumar1,Prasad V. Kamakshi2,Kumaraswamy R.3

Affiliation:

1. 1Department of IS&E, Siddaganga Institute of Technology, Tumkur 572103, Karnataka, India

2. 2School of Information Technology, JNTU, Kukatpally, Hyderabad 500085, Andhra Pradesh, India

3. 3Senior Member, IEEE, Department of EC&E, Siddaganga Institute of Technology, Tumkur 572103, Karnataka, India

Abstract

AbstractSpeaker recognition (SR) under mismatched conditions is a challenging task. Speech signal is nonlinear and nonstationary, and therefore, difficult to analyze under realistic conditions. Also, in real conditions, the nature of the noise present in speech data is not known a priori. In such cases, the performance of speaker identification (SI) or speaker verification (SV) degrades considerably under realistic conditions. Any SR system uses a voice activity detector (VAD) as the front-end subsystem of the whole system. The performance of most VADs deteriorates at the front end of the SR task or system under degraded conditions or in realistic conditions where noise plays a major role. Recently, speech data analysis and processing using Norden E. Huang’s empirical mode decomposition (EMD) combined with Hilbert transform, commonly referred to as Hilbert–Huang transform (HHT), has become an emerging trend. EMD is an a posteriori, adaptive, data analysis tool used in time domain that is widely accepted by the research community. Recently, speech data analysis and speech data processing for speech recognition and SR tasks using EMD have been increasing. EMD-based VAD has become an important adaptive subsystem of the SR system that mostly mitigates the effect of mismatch between the training and the testing phase. Recently, we have developed a VAD algorithm using a zero-frequency filter-assisted peaking resonator (ZFFPR) and EMD. In this article, the efficacy of an EMD-based VAD algorithm is studied at the front end of a text-independent language-independent SI task for the speaker’s data collected in three languages at five different places, such as home, street, laboratory, college campus, and restaurant, under realistic conditions using EDIROL-R09 HR, a 24-bit wav/MP3 recorder. The performance of this proposed SI task is compared against the traditional energy-based VAD in terms of percentage identification rate. In both cases, widely accepted Mel frequency cepstral coefficients are computed by employing frame processing (20-ms frame size and 10-ms frame shift) from the extracted voiced speech regions using the respective VAD techniques from the realistic speech utterances, and are used as a feature vector for speaker modeling using popular Gaussian mixture models. The experimental results showed that the proposed SI task with the VAD algorithm using ZFFPR and EMD at its front end performs better than the SI task with short-term energy-based VAD when used at its front end, and is somewhat encouraging.

Publisher

Walter de Gruyter GmbH

Subject

Artificial Intelligence,Information Systems,Software

Reference148 articles.

1. An analysis of the embedded frequency content of macroeconomic indicators and their counterparts using the Hilbert transform Bank of Finland Research Discussion;Crowley;Papers,2009

2. The quefrency alanysis of time series for echoes : cepstrum pseudo autocovariance cross - cepstrum and saphe cracking in : Proceedings of the Symposium on Time Series Analysis ed Chapter pp New York;Bogert,1963

3. Overview of speaker enhancement techniques for automatic speaker recognition in Proceedings of Fourth International Conference on Spoken Language Processing;Ortega;October,1996

4. Nonlinear evolution of water waves s view in Proceedings of the International Symposium on nd ed eds World Scientific Scotland UK;Huang;Experimental Chaos,1995

5. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences;Davis;IEEE Acoust Speech,1980

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3