Whispered Speech Detection Using Glottal Flow-Based Features

Author:

Phapatanaburi Khomdet,Pathonsuwan Wongsathon,Wang Longbiao,Anchuen Patikorn,Jumphoo Talit,Buayai Prawit,Uthansakul Monthippa,Uthansakul Peerapong

Abstract

Recent studies have reported that the performance of Automatic Speech Recognition (ASR) technologies designed for normal speech notably deteriorates when it is evaluated by whispered speech. Therefore, the detection of whispered speech is useful in order to attenuate the mismatch between training and testing situations. This paper proposes two new Glottal Flow (GF)-based features, namely, GF-based Mel-Frequency Cepstral Coefficient (GF-MFCC) as a magnitude-based feature and GF-based relative phase (GF-RP) as a phase-based feature for whispered speech detection. The main contribution of the proposed features is to extract magnitude and phase information obtained by the GF signal. In the GF-MFCC, Mel-frequency cepstral coefficient (MFCC) feature extraction is modified using the estimated GF signal derived from the iterative adaptive inverse filtering as the input to replace the raw speech signal. In a similar way, the GF-RP feature is the modification of the relative phase (RP) feature extraction by using the GF signal instead of the raw speech signal. The whispered speech production provides lower amplitude from the glottal source than normal speech production, thus, the whispered speech via Discrete Fourier Transformation (DFT) provides the lower magnitude and phase information, which make it different from a normal speech. Therefore, it is hypothesized that two types of our proposed features are useful for whispered speech detection. In addition, using the individual GF-MFCC/GF-RP feature, the feature-level and score-level combination are also proposed to further improve the detection performance. The performance of the proposed features and combinations in this study is investigated using the CHAIN corpus. The proposed GF-MFCC outperforms MFCC, while GF-RP has a higher performance than the RP. Further improved results are obtained via the feature-level combination of MFCC and GF-MFCC (MFCC&GF-MFCC)/RP and GF-RP(RP&GF-RP) compared with using either one alone. In addition, the combined score of MFCC&GF-MFCC and RP&GF-RP gives the best frame-level accuracy of 95.01% and the utterance-level accuracy of 100%.

Publisher

MDPI AG

Subject

Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3