Scribe-Reference-Cited by-同舟云学术

Scribe

Published:2017-08-23 Issue:9 Volume:60 Page:93-100
ISSN:0001-0782
Container-title:Communications of the ACM
language:en
Short-container-title:Commun. ACM

Author:

Lasecki Walter S.¹,Miller Christopher D.²,Naim Iftekhar²,Kushalnagar Raja³,Sadilek Adam²,Gildea Daniel²,Bigham Jeffrey P.⁴

Affiliation:

1. University of Michigan

2. University of Rochester

3. Gallaudet University

4. Carnegie Mellon University

Abstract

Quickly converting speech to text allows deaf and hard of hearing people to interactively follow along with live speech. Doing so reliably requires a combination of perception, understanding, and speed that neither humans nor machines possess alone. In this article, we discuss how our Scribe system combines human labor and machine intelligence in real time to reliably convert speech to text with less than 4s latency. To achieve this speed while maintaining high accuracy, Scribe integrates automated assistance in two ways. First, its user interface directs workers to different portions of the audio stream, slows down the portion they are asked to type, and adaptively determines segment length based on typing speed. Second, it automatically merges the partial input of multiple workers into a single transcript using a custom version of multiple-sequence alignment. Scribe illustrates the broad potential for deeply interleaving human labor and machine intelligence to provide intelligent interactive services that neither can currently achieve alone.

Funder

National Science Foundation

University of Michigan

Google

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3068663

Reference32 articles.

1. Crowds in two seconds