Pyramid Feature Attention Network for Speech Resampling Detection-Reference-Cited by-同舟云学术

Pyramid Feature Attention Network for Speech Resampling Detection

Published:2024-06-01 Issue:11 Volume:14 Page:4803
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Zhou Xinyu¹,Zhang Yujin¹^ORCID,Wang Yongqi¹,Tian Jin¹,Xu Shaolun²

Affiliation:

1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China

2. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Abstract

Speech forgery and tampering, increasingly facilitated by advanced audio editing software, pose significant threats to the integrity and privacy of digital speech avatars. Speech resampling is a post-processing operation of various speech-tampering means, and the forensic detection of speech resampling is of great significance. For speech resampling detection, most of the previous works used traditional methods of feature extraction and classification to distinguish original speech from forged speech. In view of the powerful ability of deep learning to extract features, this paper converts the speech signal into a spectrogram with time-frequency characteristics, and uses the feature pyramid network (FPN) with the Squeeze and Excitation (SE) attention mechanism to learn speech resampling features. The proposed method combines the low-level location information and the high-level semantic information, which dramatically improves the detection performance of speech resampling. Experiments were carried out on a resampling corpus made on the basis of the TIMIT dataset. The results indicate that the proposed method significantly improved the detection accuracy of various resampled speech. For the tampered speech with a resampling factor of 0.9, the detection accuracy is increased by nearly 20%. In addition, the robustness test demonstrates that the proposed model has strong resistance to MP3 compression, and the overall performance is better than the existing methods.

Funder

National Natural Science Foundation of China

Natural Science Foundation of Shanghai

Opening Project of Shanghai Key Laboratory of Integrated Administration Technologies for Information Security

Innovation Fund for Industry-University-Research of Chinese Universities

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/11/4803/pdf

Reference44 articles.

1. (2023, December 17). Audacity: Free Audio Editor and Recorder. Available online: http://www.audacityteam.org/.

2. (2023, August 06). Cool Edit Pro Is Now Adobe Audition. Available online: http://www.adobe.com/products/audition.html.

3. (2024, February 03). Gold Wave-Audio Editor, Recorder, Converter, Restoration, and Analysis Software. Available online: http://www.goldwave.ca/.

4. Detection of speech smoothing on very short clips;Yan;IEEE Trans. Inf. Forensics Secur.,2019

5. Bevinamarad, P.R., and Shirldonkar, M. (2020, January 15–17). Audio forgery detection techniques: Present and past review. Proceedings of the 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), Tirunelveli, India.