Improving Readability for Automatic Speech Recognition Transcription

Author:

Liao Junwei1,Eskimez Sefik Emre2,Lu Liyang2,Shi Yu2,Gong Ming3,Shou Linjun3,Qu Hong1,Zeng Michael2

Affiliation:

1. University of Electronic Science and Technology of China, China

2. Microsoft Speech and Dialogue Research Group, USA

3. Microsoft STCA NLP Group, China

Abstract

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other noises common in spoken communication. These readable issues introduced by speakers and ASR systems will impair the performance of downstream tasks and the understanding of human readers. In this work, we present a task called ASR post-processing for readability (APR) and formulate it as a sequence-to-sequence text generation problem. The APR task aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of speakers. We further study the APR task from the benchmark dataset, evaluation metrics, and baseline models: first, to address the lack of task-specific data, we propose a method to construct a dataset for the APR task by using the data collected for grammatical error correction. Second, we utilize metrics adapted or borrowed from similar tasks to evaluate model performance on the APR task. Lastly, we use several typical or adapted pre-trained models as the baseline models for the APR task. Furthermore, we fine-tune the baseline models on the constructed dataset and compare their performance with a traditional pipeline method in terms of proposed evaluation metrics. Experimental results show that all the fine-tuned baseline models perform better than the transitional pipeline method and our adapted RoBERTa model outperforms the pipeline method by 4.95 and 6.63 BLEU points on two test sets respectively. The human evaluation and case study further reveal the ability of the proposed model to improve the readability of ASR transcripts.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference80 articles.

1. Deep Learning for Arabic Error Detection and Correction

2. C. Anantaram , Sunil Kumar Kopparapu , Chiragkumar Patel , and Aditya Mittal . 2016 . Repairing General-Purpose ASR Output to Improve Accuracy of Spoken Sentences in Specific Domains Using Artificial Development Approach . In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016 , New York, NY, USA , 9-15 July 2016, Subbarao Kambhampati (Ed.). IJCAI/AAAI Press, 4234–4235. http://www.ijcai.org/Abstract/16/637 C. Anantaram, Sunil Kumar Kopparapu, Chiragkumar Patel, and Aditya Mittal. 2016. Repairing General-Purpose ASR Output to Improve Accuracy of Spoken Sentences in Specific Domains Using Artificial Development Approach. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, Subbarao Kambhampati (Ed.). IJCAI/AAAI Press, 4234–4235. http://www.ijcai.org/Abstract/16/637

3. Youssef Bassil and Mohammad Alwani . 2012. Post-editing error correction algorithm for speech recognition using bing spelling suggestion. ArXiv preprint abs/1203.5255 ( 2012 ). https://arxiv.org/abs/1203.5255 Youssef Bassil and Mohammad Alwani. 2012. Post-editing error correction algorithm for speech recognition using bing spelling suggestion. ArXiv preprint abs/1203.5255 (2012). https://arxiv.org/abs/1203.5255

4. Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news

5. Post-processing of the recognized speech for web presentation of large audio archive

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Generative Speech Recognition Error Correction With Large Language Models and Task-Activating Prompting;2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU);2023-12-16

2. Multi Transcription-Style Speech Transcription Using Attention-Based Encoder-Decoder Model;2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU);2023-12-16

3. Audio Reading Assistant for Visually Impaired People;Advances in Cyber-Physical Systems;2023-11-10

4. The ASR Post-Processor Performance Challenges of BackTranScription (BTS): Data-Centric and Model-Centric Approaches;Mathematics;2022-10-02

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3