Affiliation:
1. School of Theater Drama and Art College, Shenyang Normal University, No.253 Huanghe North Avenue, Huanggu District, Shenyang, Liaoning 110136, China
Abstract
Speech is one of the most sophisticated human motor skills. Speaker identification is the ability of a software component or hardware to acquire a speech signal, recognize the speakers included in the signal, and identify the speaker after the audio signals have been received. This study proposes a fluctuating equation inversion method using feature extraction for broadcast hosting. Feature extraction aims to provide useful signal features from natural audio that can be applied to various downstream processes, including recitation, evaluation, and categorization. Initially, data were collected from the CASIA dataset. This study evaluated the experimental outcomes of the proposed approach using mel-frequency cepstral coefficients, gammatone frequency cepstral coefficients, and linear frequency cepstral coefficients. The proposed technique was tested on a publicly accessible dataset, and the findings showed that it performed better in terms of recognition accuracy (98%), precision (97%), recall (96.05%), sensitivity (92.56%), and F1-score (95.09%) than the conventional feature extraction methods. The proposed approach can be utilized to improve audio signal quality and user experience across broadcast-hosting applications.
Publisher
Fuji Technology Press Ltd.