Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification-Reference-Cited by-同舟云学术

Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification

Published:2023-07-24 Issue:14 Volume:23 Page:6637
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Mewada Hiren¹^ORCID,Al-Asad Jawad F.¹^ORCID,Almalki Faris A.²^ORCID,Khan Adil H.¹^ORCID,Almujally Nouf Abdullah³^ORCID,El-Nakla Samir¹^ORCID,Naith Qamar⁴

Affiliation:

1. Electrical Engineering Department, Prince Mohammad bin Fahd University, P.O. Box 1664, Al Khobar 31952, Saudi Arabia

2. Department of Computer Engineering, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia

3. Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

4. Department of Software Engineering, College of Computer Science and Engineering, University of Jeddah, P.O. Box 34, Jeddah 21959, Saudi Arabia

Abstract

Voice-controlled devices are in demand due to their hands-free controls. However, using voice-controlled devices in sensitive scenarios like smartphone applications and financial transactions requires protection against fraudulent attacks referred to as “speech spoofing”. The algorithms used in spoof attacks are practically unknown; hence, further analysis and development of spoof-detection models for improving spoof classification are required. A study of the spoofed-speech spectrum suggests that high-frequency features are able to discriminate genuine speech from spoofed speech well. Typically, linear or triangular filter banks are used to obtain high-frequency features. However, a Gaussian filter can extract more global information than a triangular filter. In addition, MFCC features are preferable among other speech features because of their lower covariance. Therefore, in this study, the use of a Gaussian filter is proposed for the extraction of inverted MFCC (iMFCC) features, providing high-frequency features. Complementary features are integrated with iMFCC to strengthen the features that aid in the discrimination of spoof speech. Deep learning has been proven to be efficient in classification applications, but the selection of its hyper-parameters and architecture is crucial and directly affects performance. Therefore, a Bayesian algorithm is used to optimize the BiLSTM network. Thus, in this study, we build a high-frequency-based optimized BiLSTM network to classify the spoofed-speech signal, and we present an extensive investigation using the ASVSpoof 2017 dataset. The optimized BiLSTM model is successfully trained with the least epoch and achieved a 99.58% validation accuracy. The proposed algorithm achieved a 6.58% EER on the evaluation dataset, with a relative improvement of 78% on a baseline spoof-identification system.

Funder

Princess Nourah bint Abdulrahman University

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/14/6637/pdf

Reference75 articles.

1. Spoofing and countermeasures for speaker verification: A survey;Wu;Speech Commun.,2015

2. Kinnunen, T., Sahidullah, M., Delgado, H., Todisco, M., Evans, N., Yamagishi, J., and Lee, K.A. (2017). The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection, The International Speech Communication Association.

3. Ghaderpour, E., Pagiatakis, S.D., and Hassan, Q.K. (2021). A survey on change detection and time series analysis with applications. Appl. Sci., 11.

4. Wavelet features embedded convolutional neural network for multiscale ear recognition;Mewada;J. Electron. Imaging,2020

5. Alim, S.A., and Rashid, N.K.A. (2018). Some Commonly Used Speech Feature Extraction Algorithms, IntechOpen.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Fast Gaussian Filter Approximations Comparison on SIMD Computing Platforms;Applied Sciences;2024-05-29

2. A Deep Learning Network for Classification and Visual Deterioration Detection of Concrete Surfaces;2024 IEEE World AI IoT Congress (AIIoT);2024-05-29

3. Derin Sahte Ses Manipülasyonu Tespit Sistemleri Üzerine Bir Derleme;Yüzüncü Yıl Üniversitesi Fen Bilimleri Enstitüsü Dergisi;2024-04-30