Author:
Ebeling Signe O.,Heuboeck Alois
Abstract
The information contained in a document is only partly represented by the wording of the text; in addition, features of formatting and layout can be combined to lend specific functionality to chunks of text (e.g., section headings, highlighting, enumeration through list formatting, etc.). Such functional features, although based on the ‘objective’ typographical surface of the document, are often inconsistently realised and encoded only implicitly, i.e., they depend on deciphering by a competent reader. They are characteristic of documents produced with standard text-processing tools. We discuss the representation of such information with reference to the British Academic Written English (BAWE) corpus of student writing, currently under construction at the universities of Warwick, Reading and Oxford Brookes. Assignments are usually submitted to the corpus as Microsoft Word documents and make heavy use of surface-based functional features. As the documents are to be transformed into XML-encoded corpus files, this information can only be preserved through explicit annotation, based on interpretation. We present a discussion of the choices made in the BAWE corpus and the practical requirements for a tagging interface.
Publisher
Edinburgh University Press
Subject
Linguistics and Language,Language and Linguistics
Reference14 articles.
1. Corpus Design Criteria
2. Biber, D. 1988. Variation Across Speech and Writing. Cambridge: Cambridge University Press.
3. Biber, D., S. Conrad and R. Reppen. 1998. Corpus Linguistics: Investigating Language Structure in Use. Cambridge: Cambridge University Press.
4. Burnard, L. 2000. Reference guide for the British National Corpus (World edition). L. Burnard (ed.). Published for the British National Corpus Consortium by the Humanities Computing Unit at Oxford University Computing Services. October 2000. Available online at: http://www.natcorp.ox.ac.uk/docs/userManual/ (Accessed 26 April 2006.)
5. Burnard, L. 2005. `Metadata for corpus work' in M. Wynne (ed.) Developing Linguistic Corpora: A Guide to Good Practice, pp. 30-46. Oxford: Oxbow Books. Available online at: http://ahds.ac.uk/linguistic-corpora/ (Accessed 28 March 2006.)
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献