Abstract
Large databases of transcribed speech, downloadable from the Internet, are a corpus linguist's dream. They turn into a corpus linguist's nightmare, however, when the transcriptions are not linguistically accurate. In this paper I assess the suitability of the Hansard parliamentary transcripts (200 million words, downloadable) as a corpus linguistic resource, comparing a sample of the official transcript to a transcript made from a recording of a House of Commons session. The findings are that, as could be expected from earlier research, the transcripts omit performance characteristics of spoken language, such as incomplete utterances or hesitations, as well as any type of extrafactual, contextual talk (e.g., about turn-taking). Moreover, however, the transcribers and editors also alter speakers' lexical and grammatical choices towards more conservative and formal variants. Linguists ought, therefore, to be cautious in their use of the Hansard transcripts and, generally, in the use of transcriptions that have not been made for linguistic purposes.
Publisher
Edinburgh University Press
Subject
Linguistics and Language,Language and Linguistics
Reference14 articles.
1. Bayley, P. (ed.). 2004. Cross-Cultural Perspectives on Parliamentary Discourse. Amsterdam/Philadelphia: Benjamins.
2. Biber, D., S. Johansson, G. Leech, S. Conrad and E. Finegan. 1999. Longman Grammar of Spoken and Written English. London: Longman.
3. Department of the Official Report. 2006. The Parliamentary Debates. Volume 447 (14th volume of the 2005-2006 session). London: HMSO.
4. Fries, U. 1981. `Zur Kongruenz bei Kollektiven' in W. Pöckl (ed.) Europäische Mehrsprachigkeit. Festschrift zum Geburtstag von Mario Wandruszka, pp. 19-27. Tübingen: Niemeyer.
Cited by
86 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献