A Comparison of Human and Statistical Language Model Performance using Missing-Word Tests-Reference-Cited by-同舟云学术

A Comparison of Human and Statistical Language Model Performance using Missing-Word Tests

Published:1997-10 Issue:4 Volume:40 Page:377-389
ISSN:0023-8309
Container-title:Language and Speech
language:en
Short-container-title:Lang Speech

Author:

Owens M.¹,O'Boyle P.¹,McMahon J.¹,Ming J.¹,Smith F. J.¹

Affiliation:

1. The Queen's University of Belfast

Abstract

This paper presents results from a series of missing-word tests, in which a small fragment of text is presented to human subjects who are then asked to suggest a ranked list of completions. The same experiment is repeated with the WA model, an n-gram statistical language model. From the completion data two measures are obtained: (i) verbatim predictability, which indicates the extent to which subjects nominated exactly the missing word, and (ii) grammatical class predictability, which indicates the extent to which subjects nominated words of the same grammatical class as the missing word. The differences in language model performance and human performance are encouragingly small, especially for verbatim predictability. This is especially significant given that the WA model was able, on average, to use at most half the available context. The results highlight human superiority in handling missing content words. Most importantly, the experiments illustrate the detailed information one can obtain about the performance of a language model through using missing-word tests.

Publisher

SAGE Publications

Subject

Speech and Hearing,Linguistics and Language,Sociology and Political Science,Language and Linguistics,General Medicine

Link

http://journals.sagepub.com/doi/pdf/10.1177/002383099704000404

Reference16 articles.

1. Sources of contextual constraint upon words in sentences.

2. Estimating hidden Markov model parameters so as to maximize speech recognition accuracy

3. The predictability of words and their grammatical classes as a function of rate of deletion from a speech transcript

4. A language model for very large-vocabulary speech recognition

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Contextual Predictability of Texts for Texts Processing and Understanding;Mining Intelligence and Knowledge Exploration;2020

2. N-gram probability effects in a cloze task;The Mental Lexicon;2014-12-31

3. Creating a Spontaneous Conversational Speech Corpus;Data Science Journal;2012