Affiliation:
1. Beijing Foreign Studies University
2. Sichuan International Studies University
3. University of Birmingham
Abstract
Abstract
Certain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high
accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping
to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches
in corpus linguistics. To address this, our study explores the possibility of using large language models (LLMs) to automate
pragma-discursive corpus annotation. We compare GPT-3.5 (the model behind the free-to-use version of ChatGPT), GPT-4 (the model
underpinning the precise mode of Bing chatbot), and a human coder in annotating apology components in English based on the local
grammar framework. We find that GPT-4 outperformed GPT-3.5, with accuracy approaching that of a human coder. These results suggest
that LLMs can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient, scalable,
and accessible.
Publisher
John Benjamins Publishing Company
Reference48 articles.
1. The Language of Patient Feedback
2. Language models are few-shot learners;Brown,2020
3. A corpus analysis of online news comments using the Appraisal framework
4. ‘Not a guarantee of future performance’: The local grammar of disclaimers;Cheng;Applied Linguistics,2018