Age group classification and gender recognition from speech with temporal convolutional neural networks-Reference-Cited by-同舟云学术

Age group classification and gender recognition from speech with temporal convolutional neural networks

Published:2022-01 Issue:3 Volume:81 Page:3535-3552
ISSN:1380-7501
Container-title:Multimedia Tools and Applications
language:en
Short-container-title:Multimed Tools Appl

Author:

Sánchez-Hevia Héctor A.,Gil-Pita Roberto,Utrilla-Manso Manuel,Rosa-Zurera Manuel^ORCID

Abstract

AbstractThis paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%.

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Hardware and Architecture,Media Technology,Software

Link

https://link.springer.com/content/pdf/10.1007/s11042-021-11614-4.pdf

Reference43 articles.

1. Abadi M, Agarwal A, Barham P, et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. http://tensorflow.org/. Software available from tensorflow.org

2. Abdel-Hamid O, Abdel-Rahman M, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural network for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(10):1533–1545

3. Badshah A, Ahmad J, Rahim N, Baik S (2017) Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International conference on platform technology and service (PlatCon), pp 1–5

4. Bahari M, McLaren M, Van Leeuwen D, et al (2012) Age estimation from telephone speech using i-vectors. In: Proceedings of Interspeech 2012. Portland, USA

5. Bhat C, Mithum B, Saxena V, Kulkarni V, Kopparapu S (2013) Deploying usable speech enabled ivr systems for mass use. In: 2013 IEEE international conference on human computer interaction (ICHCI), pp 1–5

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Konuşmacıları Kadın, Erkek ve Çocuk Olarak Sınıflandırmada Veri Artırmanın Performansa Etkisi;Iğdır Üniversitesi Fen Bilimleri Enstitüsü Dergisi;2024-09-01

2. Analyzing Wav2Vec 1.0 Embeddings for Cross-Database Parkinson’s Disease Detection and Speech Features Extraction;Sensors;2024-08-26

3. Automatic Age and Gender Recognition Using Ensemble Learning;Applied Sciences;2024-08-06

4. A Quest for Formant-Based Compact Nonuniform Trapezoidal Filter Banks for Speech Processing with VGG16;Circuits, Systems, and Signal Processing;2024-07-29

5. Analyzing wav2vec embedding in Parkinson’s disease speech: A study on cross-database classification and regression tasks;2024-04-12