Significance of relative phase features for shouted and normal speech classification-Reference-Cited by-同舟云学术

Significance of relative phase features for shouted and normal speech classification

Published:2024-01-06 Issue:1 Volume:2024 Page:
ISSN:1687-4722
Container-title:EURASIP Journal on Audio, Speech, and Music Processing
language:en
Short-container-title:J AUDIO SPEECH MUSIC PROC.

Author:

Phapatanaburi Khomdet,Wang Longbiao^ORCID,Liu Meng,Nakagawa Seiichi,Jumphoo Talit,Uthansakul Peerapong

Abstract

AbstractShouted and normal speech classification plays an important role in many speech-related applications. The existing works are often based on magnitude-based features and ignore phase-based features, which are directly related to magnitude information. In this paper, the importance of phase-based features is explored for the detection of shouted speech. The novel contributions of this work are as follows. (1) Three phase-based features, namely, relative phase (RP), linear prediction analysis estimated speech-based RP (LPAES-RP) and linear prediction residual-based RP (LPR-RP) features, are explored for shouted and normal speech classification. (2) We propose a new RP feature, called the glottal source-based RP (GRP) feature. The main idea of the proposed GRP feature is to exploit the difference between RP and LPAES-RP features to detect shouted speech. (3) A score combination of phase- and magnitude-based features is also employed to further improve the classification performance. The proposed feature and combination are evaluated using the shouted normal electroglottograph speech (SNE-Speech) corpus. The experimental findings show that the RP, LPAES-RP, and LPR-RP features provide promising results for the detection of shouted speech. We also find that the proposed GRP feature can provide better results than those of the standard mel-frequency cepstral coefficient (MFCC) feature. Moreover, compared to using individual features, the score combination of the MFCC and RP/LPAES-RP/LPR-RP/GRP features yields an improved detection performance. Performance analysis under noisy environments shows that the score combination of the MFCC and the RP/LPAES-RP/LPR-RP features gives more robust classification. These outcomes show the importance of RP features in distinguishing shouted speech from normal speech.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s13636-023-00324-4.pdf

Reference41 articles.

1. J. Campbell, Speaker recognition: a tutorial. Proc. IEEE 85, 1437–1462 (1997)

2. X. He, L. Deng, Speech-centric information processing: an optimization-oriented approach. Proc. IEEE 101, 1116–1135 (2013)

3. J. Li, L. Deng, Y. Gong, R. Haeb-Umbach, An overview of noise-robust automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22, 745–777 (2014)

4. I. Shahin, Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Process. 88, 2700–2708 (2008)

5. E. Jokinen, R. Saeidi, T. Kinnunen, P. Alku, Vocal effort compensation for MFCC feature extraction in a shouted versus normal speaker recognition task. Comput. Speech Lang. 53, 1–11 (2019)