SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition Systems

Author:

Chen Yuxuan1ORCID,Zhang Jiangshan2,Yuan Xuejing2,Zhang Shengzhi3,Chen Kai2,Wang Xiaofeng4,Guo Shanqing1

Affiliation:

1. School of Cyber Science and Technology, Shandong University, Qingdao, Shandong, China

2. SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences, Haidian, Beijing, China

3. Department of Computer Science, Metropolitan College, Boston University, Boston, MA, USA

4. School of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN, USA

Abstract

With the wide use of Automatic Speech Recognition (ASR) in applications such as human machine interaction, simultaneous interpretation, audio transcription, and so on, its security protection becomes increasingly important. Although recent studies have brought to light the weaknesses of popular ASR systems that enable out-of-band signal attack, adversarial attack, and so on, and further proposed various remedies (signal smoothing, adversarial training, etc.), a systematic understanding of ASR security (both attacks and defenses) is still missing, especially on how realistic such threats are and how general existing protection could be. In this article, we present our systematization of knowledge for ASR security and provide a comprehensive taxonomy for existing work based on a modularized workflow. More importantly, we align the research in this domain with that on security in Image Recognition System (IRS), which has been extensively studied, using the domain knowledge in the latter to help understand where we stand in the former. Generally, both IRS and ASR are perceptual systems. Their similarities allow us to systematically study existing literature in ASR security based on the spectrum of attacks and defense solutions proposed for IRS, and pinpoint the directions of more advanced attacks and the directions potentially leading to more effective protection in ASR. In contrast, their differences, especially the complexity of ASR compared with IRS, help us learn unique challenges and opportunities in ASR security. Particularly, our experimental study shows that transfer attacks across ASR models are feasible, even in the absence of knowledge about models (even their types) and training data.

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,General Computer Science

Reference165 articles.

1. Universal adversarial audio perturbations;Abdoli Sajjad;IEEE Trans. Pattern Anal. Mach. Intell.,2019

2. Hadi Abdullah Washington Garcia Christian Peeters Patrick Traynor Kevin R. B. Butler and Joseph Wilson. 2019. Practical hidden voice attacks against speech and speaker recognition systems. In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS) .

3. Hear “No Evil,” see “Kenansville”: Efficient and transferable black-box attacks on speech recognition and voice identification systems;Abdullah Hadi;42nd IEEE Symposium on Security and Privacy,2021

4. Beyond \( L\_p \) clipping: Equalization-based psychoacoustic attacks against ASRs;Abdullah Hadi;arXiv preprint arXiv:2110.13250,2021

5. SoK: The faults in our ASRs: An overview of attacks against automatic speech recognition and speaker identification systems;Abdullah Hadi;42nd IEEE Symposium on Security and Privacy,2021

Cited by 9 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3