Static–dynamic features and hybrid deep learning models based spoof detection system for ASV-Reference-Cited by-同舟云学术

Static–dynamic features and hybrid deep learning models based spoof detection system for ASV

Published:2021-11-19 Issue: Volume: Page:
ISSN:2199-4536
Container-title:Complex & Intelligent Systems
language:en
Short-container-title:Complex Intell. Syst.

Author:

Mittal Aakshi,Dua Mohit^ORCID

Abstract

AbstractDetection of spoof is essential for improving the performance of current scenario of Automatic Speaker Verification (ASV) systems. Empowerment to both frontend and backend parts can build the robust ASV systems. First, this paper discuses performance comparison of static and static–dynamic Constant Q Cepstral Coefficients (CQCC) frontend features by using Long Short Term Memory (LSTM) with Time Distributed Wrappers model at the backend. Second, it performs comparative analysis of ASV systems built using three deep learning models LSTM with Time Distributed Wrappers, LSTM and Convolutional Neural Network at backend and using static–dynamic CQCC features at frontend. Third, it discusses implementation of two spoof detection systems for ASV by using same static–dynamic CQCC features at frontend and different combination of deep learning models at backend. Out of these two, the first one is a voting protocol based two-level spoof detection system that uses CNN, LSTM model at first level and LSTM with Time Distributed Wrappers model at second level. The second one is a two-level spoof detection system with user identification and verification protocol, which uses LSTM model for user identification at first level and LSTM with Time Distributed Wrappers for verification at the second level. For implementing the proposed work, a variation in ASVspoof 2019 dataset has been used to introduce all types of spoofing attacks such as Speech Synthesis (SS), Voice Conversion (VC) and replay in single set of dataset. The results show that, at frontend, static–dynamic CQCC feature outperform static CQCC features and at the backend, hybrid combination of deep learning models increases accuracy of spoof detection systems.

Publisher

Springer Science and Business Media LLC

Subject

General Earth and Planetary Sciences,General Environmental Science

Link

https://link.springer.com/content/pdf/10.1007/s40747-021-00565-w.pdf

Reference46 articles.

1. Beranek B (2013) Voice biometrics: success stories, success factors and what’s next. Biometr Technol Today 2013(7):9–11

2. Indumathi A, Chandra E (2012) Survey on speech synthesis. Signal Process Int J (SPIJ) 6(5):140

3. Lim R, Kwan E (2011) Voice conversion application (VOCAL). In: 2011 international conference on uncertainty reasoning and knowledge engineering, vol 1. IEEE, pp 259–262

4. Mohammadi SH, Kain A (2017) An overview of voice conversion systems. Speech Commun 88:65–82

5. Patil HA, Kamble MR (2018) A survey on replay attack detection for automatic speaker verification (ASV) system. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 1047–1053

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Detecting Audio Deepfakes: Integrating CNN and BiLSTM with Multi-Feature Concatenation;Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security;2024-06-24

2. SFC-NIDS: a sustainable and explainable flow filtering based concept drift-driven security approach for network introspection;Cluster Computing;2024-04-29

3. Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection;International Journal of Speech Technology;2024-03

4. A review on Gujarati language based automatic speech recognition (ASR) systems;International Journal of Speech Technology;2024-03

5. A lightweight feature extraction technique for deepfake audio detection;Multimedia Tools and Applications;2024-01-25