Author:
Adolphs Svenja,Knight Dawn,Smith Catherine,Price Dominic
Abstract
Spoken corpora have traditionally been assembled through careful recording and transcription of discourse events, a process which is both labour intensive and often restrictive in terms of breadth of recording contexts available. To overcome these potential challenges in spoken corpus compilation, we explore the use of crowdsourcing of language samples that are reported by participants. We investigate the level of precision and recall of the ‘crowd’ when it comes to reporting language they have heard in certain contexts, alongside the use of a crowdsourcing toolkit to facilitate this task. As a focussing device for the selection of reported language samples, we draw on the use of formulaic phrases as an area that has received considerable attention by corpus linguists and applied linguists over the years. We argue that while studying reported language usage instead of actual language-in-use is problematic for several reasons, many of which have been highlighted in the literature on Discourse Completion Tasks ( Schauer and Adolphs, 2006 ), our suggested approach presents several advantages and opportunities for spoken corpus linguistics.
Publisher
Edinburgh University Press
Subject
Linguistics and Language,Language and Linguistics
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. The Video Game Dialogue Corpus;Corpora;2024-04
2. Collocations, Corpora and Language Learning;2023-06-23
3. Corpora in Applied Linguistics;CAM APPL L;2022-04-21
4. 2.3 Cynllunio Corpws Cenedlaethol mewn Iaith Leiafrifoledig;Corpus Design and Construction in Minoritised Language Contexts - Cynllunio a Chreu Corpws mewn Cyd-destunau Ieithoedd Lleiafrifoledig;2021
5. 1.3 Designing a National Corpus in a Minoritised Language;Corpus Design and Construction in Minoritised Language Contexts - Cynllunio a Chreu Corpws mewn Cyd-destunau Ieithoedd Lleiafrifoledig;2021