Author:
Subramani Nishant,Rao Delip
Abstract
Synthetic speech or “fake speech” which matches personal vocal traits has become better and cheaper due to advances in deep learning-based speech synthesis and voice conversion approaches. This increased accessibility of synthetic speech systems and the growing misuse of them highlights the critical need to build countermeasures. Furthermore, new synthesis models evolve all the time and the efficacy of previously trained detection models on these unseen attack vectors is poor. In this paper, we focus on: 1) How can we build highly accurate, yet parameter and sample-efficient models for fake speech detection? 2) How can we rapidly adapt detection models to new sources of fake speech? We present four parameter-efficient convolutional architectures for fake speech detection with best detection F1 scores of around 97 points on a large dataset of fake and bonafide speech. We show how the fake speech detection task naturally lends itself to a novel multi-task problem further improving F1 scores for a mere 0.5% increase in model parameters. Our multi-task setting also helps in data-sparse situations, commonplace in adversarial settings. We investigate an alternative approach to the data-sparsity problem using transfer learning and show that it is possible to meet purely supervised detection performance for unseen attack vectors with as little as 6.25% of the training data. This is the first known application of transfer learning in adversarial settings for speech. Finally, we show how well our transfer learning approach adapts in an instance-efficient way to new attack vectors using the Real-Time Voice Cloning toolkit. We exceed the purely supervised detection performance (99.18 F1) with as little as 6.25% of the data.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Audio-deepfake detection: Adversarial attacks and countermeasures;Expert Systems with Applications;2024-09
2. PS3DT: Synthetic Speech Detection Using Patched Spectrogram Transformer;2023 International Conference on Machine Learning and Applications (ICMLA);2023-12-15
3. Combating Misinformation in the Era of Generative AI Models;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26
4. Hidden-in-Wave: A Novel Idea to Camouflage AI-Synthesized Voices Based on Speaker-Irrelative Features;2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE);2023-10-09
5. Enhancing Synthesized Speech Detection with Dual Attention Using Features Fusion;2023 International Conference on Computer Applications Technology (CCAT);2023-09-15