Exploring the Impact of Mismatch Conditions, Noisy Backgrounds, and Speaker Health on Convolutional Autoencoder-Based Speaker Recognition System with Limited Dataset

Author:

Niwatkar Arundhati,Kanse Yuvraj,Kushwaha Ajay Kumar

Abstract

This paper presents a novel approach to enhance the success rate and accuracy of speaker recognition and identification systems. The methodology involves employing data augmentation techniques to enrich a small dataset with audio recordings from five speakers, covering both male and female voices. Python programming language is utilized for data processing, and a convolutional autoencoder is chosen as the model. Spectrograms are used to convert speech signals into images, serving as input for training the autoencoder. The developed speaker recognition system is compared against traditional systems relying on the MFCC feature extraction technique. In addition to addressing the challenges of a small dataset, the paper explores the impact of a "mismatch condition" by using different time durations of the audio signal during both training and testing phases. Through experiments involving various activation and loss functions, the optimal pair for the small dataset is identified, resulting in a high success rate of 92.4% in matched conditions. Traditionally, Mel-Frequency Cepstral Coefficients (MFCC) have been widely used for this purpose. However, the COVID-19 pandemic has drawn attention to the virus's impact on the human body, particularly on areas relevant to speech, such as the chest, throat, vocal cords, and related regions. COVID-19 symptoms, such as coughing, breathing difficulties, and throat swelling, raise questions about the influence of the virus on MFCC, pitch, jitter, and shimmer features. Therefore, this research aims to investigate and understand the potential effects of COVID-19 on these crucial features, contributing valuable insights to the development of robust speaker recognition systems.

Publisher

European Alliance for Innovation n.o.

Reference21 articles.

1. Mura, M. La., Lamberti, ” Human-Machine Interaction Personalization: a Review on Gender and Emotion Recognition Through Speech Analysis.” IEEE International Workshop on Metrology for Industry 4.0 & IoT, 319-323, (2020).

2. Shelke, P. P., Wagh, K.” Review on Aspect based Sentiment Analysis on Social Data”. International Conference on Computing for Sustainable Global Development, 331-336, (2021).

3. Ishak, Z., Rajendran, N., Al Sanjary, O. I., Mat Razali, N. “Secure Biometric Lock System for Files and Applications: A Review.” IEEE International Colloquium on Signal Processing & Its Applications, 23-28, (2020).

4. Soufiane H., Nikola N., Jamal, K, “Convolutional neural network vectors for speaker recognition.” International Journal of Speech Technology, 24, 389–400, (2021).

5. Tanu Singhal, “A Review of Coronavirus Disease-2019(COVID19).” Indian J Pedatr, 87(4): 281–286, (2020).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3