Encoding document information in a corpus of student writing: the British Academic Written English corpus-Reference-Cited by-同舟云学术

Encoding document information in a corpus of student writing: the British Academic Written English corpus

Published:2007-11 Issue:2 Volume:2 Page:241-256
ISSN:1749-5032
Container-title:Corpora
language:en
Short-container-title:Corpora

Author:

Ebeling Signe O.,Heuboeck Alois

Abstract

The information contained in a document is only partly represented by the wording of the text; in addition, features of formatting and layout can be combined to lend specific functionality to chunks of text (e.g., section headings, highlighting, enumeration through list formatting, etc.). Such functional features, although based on the ‘objective’ typographical surface of the document, are often inconsistently realised and encoded only implicitly, i.e., they depend on deciphering by a competent reader. They are characteristic of documents produced with standard text-processing tools. We discuss the representation of such information with reference to the British Academic Written English (BAWE) corpus of student writing, currently under construction at the universities of Warwick, Reading and Oxford Brookes. Assignments are usually submitted to the corpus as Microsoft Word documents and make heavy use of surface-based functional features. As the documents are to be transformed into XML-encoded corpus files, this information can only be preserved through explicit annotation, based on interpretation. We present a discussion of the choices made in the BAWE corpus and the practical requirements for a tagging interface.

Publisher

Edinburgh University Press

Subject

Linguistics and Language,Language and Linguistics

Reference14 articles.

1. Corpus Design Criteria

2. Biber, D. 1988. Variation Across Speech and Writing. Cambridge: Cambridge University Press.

3. Biber, D., S. Conrad and R. Reppen. 1998. Corpus Linguistics: Investigating Language Structure in Use. Cambridge: Cambridge University Press.

4. Burnard, L. 2000. Reference guide for the British National Corpus (World edition). L. Burnard (ed.). Published for the British National Corpus Consortium by the Humanities Computing Unit at Oxford University Computing Services. October 2000. Available online at: http://www.natcorp.ox.ac.uk/docs/userManual/ (Accessed 26 April 2006.)

5. Burnard, L. 2005. `Metadata for corpus work' in M. Wynne (ed.) Developing Linguistic Corpora: A Guide to Good Practice, pp. 30-46. Oxford: Oxbow Books. Available online at: http://ahds.ac.uk/linguistic-corpora/ (Accessed 28 March 2006.)

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Social media as digital research data;Corpus Approaches to Language in Social Media;2023-07-07

2. The Varieties of English for Specific Purposes dAtabase (VESPA): Towards a multi-L1 and multi-register learner corpus of disciplinary writing;Research in Corpus Linguistics;2022

3. Attribution in novice academic writing;English Text Construction;2021-12-31

4. Phraseological teddy bears;Corpus Linguistics, Context and Culture;2019-11-18

5. Review of Leedham (2015): Chinese Students Writing in English. Implications from a Corpus-Driven Study;International Journal of Learner Corpus Research;2016-07-08