Whisper40: A Multi-Person Chinese Whisper Speaker Recognition Dataset Containing Same-Text Neutral Speech-Reference-Cited by-同舟云学术

Whisper40: A Multi-Person Chinese Whisper Speaker Recognition Dataset Containing Same-Text Neutral Speech

Published:2024-03-28 Issue:4 Volume:15 Page:184
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Yang Jingwen¹,Zhou Ruohua¹^ORCID

Affiliation:

1. School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 102616, China

Abstract

Whisper speaker recognition (WSR) has received extensive attention from researchers in recent years, and it plays an important role in medical, judicial, and other fields. Among them, the establishment of a whisper dataset is very important for the study of WSR. However, the existing whisper dataset suffers from the problems of a small number of speakers, short speech duration, and lack of neutral speech with the same-text as the whispered speech in the same dataset. To address this issue, we present Whisper40, a multi-person Chinese WSR dataset containing same-text neutral speech spanning around 655.90 min sourced from volunteers. In addition, we use the current state-of-the-art speaker recognition model to build a WSR baseline system and combine the idea of transfer learning for pre-training the speaker recognition model using neutral speech datasets and transfer the empirical knowledge of specific network layers to the WSR system. The Whisper40 and CHAINs datasets are then used to fine-tune the model with transferred specific layers. The experimental results show that the Whisper40 dataset is practical, and the time delay neural network (TDNN) model performs well in both the same/cross-scene experiments. The equal error rate (EER) of Chinese WSR after transfer learning is reduced by 27.62% in comparison.

Publisher

MDPI AG

Link

https://www.mdpi.com/2078-2489/15/4/184/pdf

Reference38 articles.

1. Shouted and whispered speech compensation for speaker verification systems;Prieto;Digit. Signal Process.,2022

2. Naini, A.R., Rao, A., and Ghosh, P.K. (2022, January 24–27). Whisper to Neutral Map** Using I-Vector Space Likelihood and a Cosine Similarity Based Iterative Optimization for Whispered Speaker Verification. Proceedings of the 2022 National Conference on Communications (NCC), Mumbai, India.

3. Listening to the screaming whisper: A voice of mother caregivers of children with autistic spectrum disorder (ASD);Kim;Int. J. Qual. Stud. Health Well-Being,2018

4. Fan, X., and Hansen, J.H.L. (2008, January 22–26). Speaker identification for whispered speech based on frequency warping and score competition. Proceedings of the Ninth Annual Conference of the International Speech Communication Association, Brisbane, Australia.

5. Patel, M., Parmar, M., Doshi, S., Shah, N., and Patil, H.A. (2019, January 20–22). Novel Inception-GAN for Whisper-to-Normal speech conversion. Proceedings of the 10th ISCA Speech Synthesis Workshop (SSW 10), Vienna, Austria.