Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments-Reference-Cited by-同舟云学术

Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments

Published:2021-11-20 Issue:22 Volume:21 Page:7728
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Kubicki Kacper,Kapusta Paweł,Ślot Krzysztof^ORCID

Abstract

The presented paper is concerned with detection of presentation attacks against unsupervised remote biometric speaker verification, using a well-known challenge–response scheme. We propose a novel approach to convolutional phoneme classifier training, which ensures high phoneme recognition accuracy even for significantly simplified network architectures, thus enabling efficient utterance verification on resource-limited hardware, such as mobile phones or embedded devices. We consider Deep Convolutional Neural Networks operating on windows of speech Mel-Spectrograms as a means for phoneme recognition, and we show that one can boost the performance of highly simplified neural architectures by modifying the principle underlying training set construction. Instead of generating training examples by slicing spectrograms using a sliding window, as it is commonly done, we propose to maximize the consistency of phoneme-related spectrogram structures that are to be learned, by choosing only spectrogram chunks from the central regions of phoneme articulation intervals. This approach enables better utilization of the limited capacity of the considered simplified networks, as it significantly reduces a within-class data scatter. We show that neural architectures comprising as few as dozens of thousands parameters can successfully—with accuracy of up to 76%, solve the 39-phoneme recognition task (we use the English language TIMIT database for experimental verification of the method). We also show that ensembling of simple classifiers, using a basic bagging method, boosts the recognition accuracy by another 2–3%, offering Phoneme Error Rates at the level of 23%, which approaches the accuracy of the state-of-the-art deep neural architectures that are one to two orders of magnitude more complex than the proposed solution. This, in turn, enables executing reliable presentation attack detection, based on just few-syllable long challenges on highly resource-limited computing hardware.

Funder

Polish National Centre for Research and Development

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/21/22/7728/pdf

Reference35 articles.

1. A Survey of Machine Learning Techniques for Behavioral-Based Biometric User Authentication;Mahadi,2018

2. Study On Most Popular Behavioral Biometrics, Advantages, Disadvantages and Recent Applications: A Review;Alsaadi;Int. J. Sci. Technol. Res.,2021

3. Using a predefined passphrase to evaluate a speaker verification system

4. A critical review and analysis on techniques of speech recognition: The road ahead

5. Speech Recognition Using Deep Neural Networks: A Systematic Review

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sensors and Pattern Recognition Methods for Security and Industrial Applications;Sensors;2022-08-10