Abstract
AbstractAutomatic Speech Recognition (ASR) systems have become ubiquitous. They can be found in a variety of form factors and are increasingly important in our daily lives. As such, ensuring that these systems are equitable to different subgroups of the population is crucial. In this paper, we introduce,AequeVox, an automated testing framework for evaluating the fairness of ASR systems.AequeVoxsimulates different environments to assess the effectiveness of ASR systems for different populations. In addition, we investigate whether the chosen simulations are comprehensible to humans. We further propose a fault localization technique capable of identifying words that are not robust to these varying environments. Both components ofAequeVoxare able to operate in the absence of ground truth data.We evaluateAequeVoxon speech from four different datasets using three different commercial ASRs. Our experiments reveal that non-native English, female and Nigerian English speakers generate109%,528.5%and156.9%more errors, on average than native English, male and UK Midlands speakers, respectively. Our user study also reveals that 82.9% of the simulations (employed through speech transformations) had a comprehensibility rating above seven (out of ten), with the lowest rating being 6.78. This further validates the fairness violations discovered byAequeVox. Finally, we show that the non-robust words, as predicted by the fault localization technique embodied inAequeVox, show223.8%more errors than the predicted robust words across all ASRs.
Publisher
Springer International Publishing
Reference57 articles.
1. Audio data augmentation (2021), https://www.kaggle.com/CVxTz/audio-data-augmentation
2. Crowdsourced high-quality nigerian english speech data set (2021), http://openslr.org/70/
3. Grammarly (2021), https://app.grammarly.com/
4. Aggarwal, A., Lohia, P., Nagar, S., Dey, K., Saha, D.: Black box fairness testing of machine learning models. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp. 625–635 (2019)
5. Asyrofi, M.H., Thung, F., Lo, D., Jiang, L.: Crossasr: Efficient differential testing of automatic speech recognition via text-to-speech. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). pp. 640–650 (2020). https://doi.org/10.1109/ICSME46990.2020.00066
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Fairness Testing: A Comprehensive Survey and Analysis of Trends;ACM Transactions on Software Engineering and Methodology;2024-06-04
2. Elucidate Gender Fairness in Singing Voice Transcription;Proceedings of the 31st ACM International Conference on Multimedia;2023-10-26
3. Synthesizing Speech Test Cases with Text-to-Speech? An Empirical Study on the False Alarms in Automated Speech Recognition Testing;Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis;2023-07-12
4. Latent Imitator: Generating Natural Individual Discriminatory Instances for Black-Box Fairness Testing;Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis;2023-07-12
5. ASDF: A Differential Testing Framework for Automatic Speech Recognition Systems;2023 IEEE Conference on Software Testing, Verification and Validation (ICST);2023-04