Affiliation:
1. Vocational School of Technical Sciences, Ordu University, Ordu 52200, Turkey
Abstract
A speech signal can provide various information about a speaker, such as their gender, age, accent, and emotional state. The gender of the speaker is the most salient piece of information contained in the speech signal and is directly or indirectly used in many applications. In this study, a new approach is proposed for recognizing the gender of the speaker based on the use of hybrid features created by stacking different types of features. For this purpose, four different features, namely Mel frequency cepstral coefficients (MFCC), Mel scaled power spectrogram (Mel Spectrogram), Chroma, Spectral contrast (Contrast), and Tonal Centroid (Tonnetz), and twelve hybrid features created by stacking these features were used. These features were applied to four different classifiers, two of which were based on traditional machine learning (KNN and LDA) while two were based on the deep learning approach (CNN and MLP), and the performance of each was evaluated separately. In the experiments conducted on the Turkish subset of the Common Voice dataset, it was observed that hybrid features, created by stacking different acoustic features, led to improvements in gender recognition accuracy ranging from 0.3 to 1.73%.
Reference34 articles.
1. Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning;Gondohanindijo;Int. J. Adv. Comput. Sci. Appl.,2023
2. Automatic speaker, age-group and gender identification from children’s speech;Safavi;Comput. Speech Lang.,2018
3. DGR: Gender recognition of human speech using one-dimensional conventional neural network;Alkhawaldeh;Sci. Program.,2019
4. Tursunov, A., Khan, M., Choeh, J.Y., and Kwon, S. (2021). Age and gender recognition using a convolutional neural network with a specially designed multi-attention module through speech spectrograms. Sensors, 21.
5. Rezapour Mashhadi, M.M., and Osei-Bonsu, K. (2023). Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest. PLoS ONE, 18.