Pyramid Feature Attention Network for Speech Resampling Detection

Author:

Zhou Xinyu1,Zhang Yujin1ORCID,Wang Yongqi1,Tian Jin1,Xu Shaolun2

Affiliation:

1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China

2. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Abstract

Speech forgery and tampering, increasingly facilitated by advanced audio editing software, pose significant threats to the integrity and privacy of digital speech avatars. Speech resampling is a post-processing operation of various speech-tampering means, and the forensic detection of speech resampling is of great significance. For speech resampling detection, most of the previous works used traditional methods of feature extraction and classification to distinguish original speech from forged speech. In view of the powerful ability of deep learning to extract features, this paper converts the speech signal into a spectrogram with time-frequency characteristics, and uses the feature pyramid network (FPN) with the Squeeze and Excitation (SE) attention mechanism to learn speech resampling features. The proposed method combines the low-level location information and the high-level semantic information, which dramatically improves the detection performance of speech resampling. Experiments were carried out on a resampling corpus made on the basis of the TIMIT dataset. The results indicate that the proposed method significantly improved the detection accuracy of various resampled speech. For the tampered speech with a resampling factor of 0.9, the detection accuracy is increased by nearly 20%. In addition, the robustness test demonstrates that the proposed model has strong resistance to MP3 compression, and the overall performance is better than the existing methods.

Funder

National Natural Science Foundation of China

Natural Science Foundation of Shanghai

Opening Project of Shanghai Key Laboratory of Integrated Administration Technologies for Information Security

Innovation Fund for Industry-University-Research of Chinese Universities

Publisher

MDPI AG

Reference44 articles.

1. (2023, December 17). Audacity: Free Audio Editor and Recorder. Available online: http://www.audacityteam.org/.

2. (2023, August 06). Cool Edit Pro Is Now Adobe Audition. Available online: http://www.adobe.com/products/audition.html.

3. (2024, February 03). Gold Wave-Audio Editor, Recorder, Converter, Restoration, and Analysis Software. Available online: http://www.goldwave.ca/.

4. Detection of speech smoothing on very short clips;Yan;IEEE Trans. Inf. Forensics Secur.,2019

5. Bevinamarad, P.R., and Shirldonkar, M. (2020, January 15–17). Audio forgery detection techniques: Present and past review. Proceedings of the 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), Tirunelveli, India.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3