Encoding document information in a corpus of student writing: the British Academic Written English corpus

Author:

Ebeling Signe O.,Heuboeck Alois

Abstract

The information contained in a document is only partly represented by the wording of the text; in addition, features of formatting and layout can be combined to lend specific functionality to chunks of text (e.g., section headings, highlighting, enumeration through list formatting, etc.). Such functional features, although based on the ‘objective’ typographical surface of the document, are often inconsistently realised and encoded only implicitly, i.e., they depend on deciphering by a competent reader. They are characteristic of documents produced with standard text-processing tools. We discuss the representation of such information with reference to the British Academic Written English (BAWE) corpus of student writing, currently under construction at the universities of Warwick, Reading and Oxford Brookes. Assignments are usually submitted to the corpus as Microsoft Word documents and make heavy use of surface-based functional features. As the documents are to be transformed into XML-encoded corpus files, this information can only be preserved through explicit annotation, based on interpretation. We present a discussion of the choices made in the BAWE corpus and the practical requirements for a tagging interface.

Publisher

Edinburgh University Press

Subject

Linguistics and Language,Language and Linguistics

Reference14 articles.

1. Corpus Design Criteria

2. Biber, D. 1988. Variation Across Speech and Writing. Cambridge: Cambridge University Press.

3. Biber, D., S. Conrad and R. Reppen. 1998. Corpus Linguistics: Investigating Language Structure in Use. Cambridge: Cambridge University Press.

4. Burnard, L. 2000. Reference guide for the British National Corpus (World edition). L. Burnard (ed.). Published for the British National Corpus Consortium by the Humanities Computing Unit at Oxford University Computing Services. October 2000. Available online at: http://www.natcorp.ox.ac.uk/docs/userManual/ (Accessed 26 April 2006.)

5. Burnard, L. 2005. `Metadata for corpus work' in M. Wynne (ed.) Developing Linguistic Corpora: A Guide to Good Practice, pp. 30-46. Oxford: Oxbow Books. Available online at: http://ahds.ac.uk/linguistic-corpora/ (Accessed 28 March 2006.)

Cited by 12 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Social media as digital research data;Corpus Approaches to Language in Social Media;2023-07-07

2. The Varieties of English for Specific Purposes dAtabase (VESPA): Towards a multi-L1 and multi-register learner corpus of disciplinary writing;Research in Corpus Linguistics;2022

3. Attribution in novice academic writing;English Text Construction;2021-12-31

4. Phraseological teddy bears;Corpus Linguistics, Context and Culture;2019-11-18

5. Review of Leedham (2015): Chinese Students Writing in English. Implications from a Corpus-Driven Study;International Journal of Learner Corpus Research;2016-07-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3