Measuring the Accuracy of Automatic Speech Recognition Solutions

Author:

Kuhn Korbinian1ORCID,Kersken Verena1ORCID,Reuter Benedikt1ORCID,Egger Niklas1ORCID,Zimmermann Gottfried1ORCID

Affiliation:

1. Stuttgart Media University, Germany

Abstract

For d/Deaf and hard of hearing (DHH) people, captioning is an essential accessibility tool. Significant developments in artificial intelligence mean that automatic speech recognition (ASR) is now a part of many popular applications. This makes creating captions easy and broadly available—but transcription needs high levels of accuracy to be accessible. Scientific publications and industry report very low error rates, claiming that artificial intelligence has reached human parity or even outperforms manual transcription. At the same time, the DHH community reports serious issues with the accuracy and reliability of ASR. There seems to be a mismatch between technical innovations and the real-life experience for people who depend on transcription. Independent and comprehensive data is needed to capture the state of ASR. We measured the performance of 11 common ASR services with recordings of Higher Education lectures. We evaluated the influence of technical conditions like streaming, the use of vocabularies, and differences between languages. Our results show that accuracy ranges widely between vendors and for the individual audio samples. We also measured a significant lower quality for streaming ASR, which is used for live events. Our study shows that despite the recent improvements of ASR, common services lack reliability in accuracy.

Funder

SHUFFLE

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,Human-Computer Interaction

Reference66 articles.

1. Chuck Adams, Alastair Campbell, Michael Cooper, and Andrew Kirkpatrick. 2022. Web Content Accessibility Guidelines (WCAG) 2.2: W3C Recommendation. Retrieved April 15, 2023 from https://www.w3.org/TR/WCAG22/

2. A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI

3. How Might We Create Better Benchmarks for Speech Recognition?

4. Automatic Text Simplification Tools for Deaf and Hard of Hearing Adults: Benefits of Lexical Simplification and Providing Users with Autonomy

5. Amazon. 2023. AWS: Amazon Transcribe Features. Retrieved April 15 2023 from https://aws.amazon.com/transcribe/features/

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Handling sensory disabilities in a smart society;Journal of Smart Cities and Society;2024-03-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3