Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech-Reference-Cited by-同舟云学术

Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech

Published:2022-03-16 Issue:3 Volume:24 Page:414
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Simić Nikola^ORCID,Suzić Siniša,Nosek Tijana,Vujović Mia,Perić Zoran,Savić Milan,Delić Vlado^ORCID

Abstract

Speaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the classification accuracy of developed models significantly decreases when applying them to emotional speech or in the presence of interference. Furthermore, deep models may require a large number of parameters, so constrained solutions are desirable in order to implement them on edge devices in the Internet of Things systems for real-time detection. The aim of this paper is to propose a simple and constrained convolutional neural network for speaker recognition tasks and to examine its robustness for recognition in emotional speech conditions. We examine three quantization methods for developing a constrained network: floating-point eight format, ternary scalar quantization, and binary scalar quantization. The results are demonstrated on the recently recorded SEAC dataset.

Funder

Science Fund of the Republic of Serbia

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/24/3/414/pdf

Reference38 articles.

1. An overview of text-independent speaker recognition: From features to supervectors

2. An overview of automatic speaker recognition technology

3. Speech Technology Progress Based on New Machine Learning Paradigm