Vocoder Detection of Spoofing Speech Based on GAN Fingerprints and Domain Generalization-Reference-Cited by-同舟云学术

Vocoder Detection of Spoofing Speech Based on GAN Fingerprints and Domain Generalization

Published:2023-10-28 Issue: Volume: Page:
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Li Fan¹,Chen Yanxiang¹,Liu Haiyang¹,Zhao Zuxing¹,Yao Yuanzhi¹,Liao Xin²

Affiliation:

1. Hefei University of Technology, China

2. Hunan University, China

Abstract

As an important part of the text-to-speech (TTS) system, vocoders convert acoustic features into speech waveforms. The difference in vocoders is key to producing different types of forged speech in the TTS system. With the rapid development of general adversarial networks (GANs), an increasing number of GAN vocoders have been proposed. Detectors often encounter vocoders of unknown types, which leads to a decline in the generalization performance of models. However, existing studies lack research on detection generalization based on GAN vocoders. To solve this problem, this study proposes vocoder detection of spoofed speech based on GAN fingerprints and domain generalization. The framework can widen the distance between real speech and forged speech in feature space, improving the detection model’s performance. Specifically, we utilize a fingerprint extractor based on an autoencoder to extract GAN fingerprints from vocoders. We then weight them to the forged speech for subsequent classification to learn the forged speech features with high differentiation. Subsequently, domain generalization is used to further improve the generalization ability of the model for unseen forgery types. We achieve domain generalization using domain-adversarial learning and asymmetric triplet loss to learn a better generalized feature space in which real speech is compact and forged speech synthesized by different vocoders is dispersed. Finally, to optimize the training process, curriculum learning is used to dynamically adjust the contributions of the samples with different difficulties in the training process. Experimental results show that the proposed method achieves the most advanced detection results among four GAN vocoders. The code is available at https://github.com/multimedia-infomation-security/GAN-Vocoder-detection.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3630751

Reference47 articles.

1. Superloss: A generic loss for robust curriculum learning;Castells Thibault;Advances in Neural Information Processing Systems,2020

2. Zhuxin Chen Zhifeng Xie Weibin Zhang and Xiangmin Xu. 2017. ResNet and Model Fusion for Automatic Spoofing Detection. In Interspeech. 102–106. Zhuxin Chen Zhifeng Xie Weibin Zhang and Xiangmin Xu. 2017. ResNet and Model Fusion for Automatic Spoofing Detection. In Interspeech. 102–106.

3. Yaroslav Ganin and Victor Lempitsky . 2015 . Unsupervised domain adaptation by backpropagation . In International conference on machine learning. PMLR, 1180–1189 . Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In International conference on machine learning. PMLR, 1180–1189.

4. Alex Graves , Marc G Bellemare , Jacob Menick , Remi Munos , and Koray Kavukcuoglu . 2017 . Automated curriculum learning for neural networks . In international conference on machine learning. PMLR, 1311–1320 . Alex Graves, Marc G Bellemare, Jacob Menick, Remi Munos, and Koray Kavukcuoglu. 2017. Automated curriculum learning for neural networks. In international conference on machine learning. PMLR, 1311–1320.

5. An efficient MFCC extraction method in speech recognition

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Introduction to the Special Issue on Integrity of Multimedia and Multimodal Data in Internet of Things;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-03-08