Abstract
Researchers claim (see Egbert, 2018) that, irrespective of the growing amount of corpora, there is insufficient focus on the research and discussion of corpus creation and analysis challenges. The ongoing international project LEXECON (2021-2024) raises awareness about these kinds of issues. The goal of this study is twofold: firstly, to explore corpus creation stages in relation to compilation criteria; and secondly, to pilot the functionality of the created subcorpus by researching first-person pronoun variations to uncover the subjectivity across the subcorpus genres. The pronouns were explored by observing their relative frequency, context, and surplus-deficit index. Two corpus analysis tools—Sketch Engine and Hyperbase 10—were applied. The corpus creation results confirm that balance is the most challenging corpus criterion to fulfil, whereas corpus editing is the most time-consuming corpus creation stage. The results obtained via first-person pronoun extraction confirm that the context and surplus-deficit index contribute to the research results no less than the relative frequency data. The analysis of personal pronoun data variations shows that essays contain the fewest first-person singular pronouns; however, in other genres, they often do not convey an authorial stance. Moreover, a greater surplus of possessive case reflects a more active authorial stance as opposed to objective case.
Reference38 articles.
1. Baker, P., Hardie, A. and McEnery, T. (2006) A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press.
2. Biber, D. (1993) Representativeness in corpus design. Literary and Linguistic Computing, 6 (4): 243-259.
3. Biber, D., Johansson, S., Leech, G. N., Conrad, S. and Finegan, E. (2021) Grammar of Spoken and Written English. Amsterdam/Philadelphia: John Benjamins Publishing Company.
4. Baumgarten, N., Du Bois, I. and House, J. (2012) Introduction. In N. Baumgarten, I. Du Bois and J. House (eds.) Subjectivity in Language and in Discourse (pp. 1-14). Bingley: Emerald Group Publishing Limited.
5. Bollmann, M. (2019) A large-scale comparison of historical text normalization systems. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers), (pp. 3885-3898). Minneapolis, Minnesota: Association for Computational Linguistics.