Gender-based speaker recognition from speech signals using GMM model

Author:

Gupta Manish1,Bharti Shambhu Shankar2,Agarwal Suneeta2

Affiliation:

1. Computer Science and Engineering Department, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, Uttar Pradesh-211004, India

2. Motilal Nehru National Institute of Technology Allahabad, Prayagraj, Uttar Pradesh-211004, India

Abstract

Speech is a convenient medium for communication among human beings. Speaker recognition is a process of automatically recognizing the speaker by processing the information included in the speech signal. In this paper, a new approach is proposed for speaker recognition through speech signal. Here, a two-level approach is proposed. In the first-level, the gender of the speaker is recognized, and in the second-level speaker is recognized based on recognized gender at first-level. After recognizing the gender of the speaker, search space is reduced to half for the second-level as speaker recognition system searches only in a set of speech signals belonging to identified gender. To identify gender, gender-specific features: Mel Frequency Cepstral Coefficients (MFCC) and pitch are used. Speaker is recognized by using speaker specific features: MFCC, Pitch and RASTA-PLP. Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) classifiers are used for identifying the gender and recognizing the speaker, respectively. Experiments are performed on speech signals of two databases: “IIT-Madras speech synthesis and recognition” (containing speech samples spoken by eight male and eight female speakers of eight different regions in English language) and “ELSDSR” (containing speech samples spoken by five male and five female in English language). Experimentally, it is observed that by using two-level approach, time taken for speaker recognition is reduced by 30–32% as compared to the approach when speaker is recognized without identifying the gender (single-level approach). The accuracy of speaker recognition in this proposed approach is also improved from 99.7% to 99.9% as compared to single-level approach. It is concluded through the experiments that speech signal of a minimum 1.12 duration (after neglecting silence parts) is sufficient for recognizing the speaker.

Publisher

World Scientific Pub Co Pte Lt

Subject

Condensed Matter Physics,Statistical and Nonlinear Physics

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Speech‐Based Dialect Identification for Tamil;Automatic Speech Recognition and Translation for Low Resource Languages;2024-03-29

2. Towards modeling raw speech in gender identification of children using sincNet over ERB scale;International Journal of Speech Technology;2023-09

3. Text-Independent Speaker Recognition System Using Feature-Level Fusion for Audio Databases of Various Sizes;SN Computer Science;2023-07-18

4. Cross B-HUB Based RNN with Random Aural-Feature Extraction for Enhanced Speaker Extraction and Speaker Recognition;Wireless Personal Communications;2023-03-27

5. Estimation of Recall Values and Accuracy of Gender Identification for the Different Age Groups Based on Voice Signals;2023 6th International Conference on Information Systems and Computer Networks (ISCON);2023-03-03

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3